On Tue, Jul 15, 2025 at 2:31 AM Menglong Dong <menglong.dong@xxxxxxxxx> wrote: > > Following are the test results for fentry-multi: > 36.36% bpf_prog_2dcccf652aac1793_bench_trigger_fentry_multi [k] > bpf_prog_2dcccf652aac1793_bench_trigger_fentry_multi > 20.54% [kernel] [k] migrate_enable > 19.35% [kernel] [k] bpf_global_caller_5_run > 6.52% [kernel] [k] bpf_global_caller_5 > 3.58% libc.so.6 [.] syscall > 2.88% [kernel] [k] entry_SYSCALL_64 > 1.50% [kernel] [k] memchr_inv > 1.39% [kernel] [k] fput > 1.04% [kernel] [k] migrate_disable > 0.91% [kernel] [k] _copy_to_user > > And I also did the testing for fentry: > 54.63% bpf_prog_2dcccf652aac1793_bench_trigger_fentry [k] > bpf_prog_2dcccf652aac1793_bench_trigger_fentry > 10.43% [kernel] [k] migrate_enable > 10.07% bpf_trampoline_6442517037 [k] bpf_trampoline_6442517037 > 8.06% [kernel] [k] __bpf_prog_exit_recur > 4.11% libc.so.6 [.] syscall > 2.15% [kernel] [k] entry_SYSCALL_64 > 1.48% [kernel] [k] memchr_inv > 1.32% [kernel] [k] fput > 1.16% [kernel] [k] _copy_to_user > 0.73% [kernel] [k] bpf_prog_test_run_raw_tp Let's pause fentry-multi stuff and fix this as a higher priority. Since migrate_disable/enable is so hot in yours and my tests, let's figure out how to inline it. As far as I can see both functions can be moved to a header file including this_rq() macro, but we need to keep struct rq private to sched.h. Moving the whole thing is not an option. Luckily we only need nr_pinned from there. Maybe we can offsetof(struct rq, nr_pinned) in a precompile step the way it's done for asm-offsets ? And then use that constant to do nr_pinned ++, --. __set_cpus_allowed_ptr() is a slow path and can stay .c Maybe Peter has better ideas ?