On Mon, Aug 25, 2025 at 5:03 AM Jiri Olsa <olsajiri@xxxxxxxxx> wrote: > > On Thu, Aug 21, 2025 at 03:14:04PM -0700, Maciej Żenczykowski wrote: > > This is on an Android 6.12 GKI kernel for cuttlefish VM, but the > > Android-ness probably shouldn't matter. > > > > $ cat config.bad | grep BPF_LSM > > CONFIG_BPF_LSM=y > > > > Trying to attach bpf_lsm_bpf_prog_load hook fails with -EBUSY (always). > > > > So I decided to enable function graph with retval tracing to track > > down the source of EBUSY... and it started reliably working. > > hi, > I don't really have a solution for you, just few notes below > > > > > $ diff config.bad config.works | egrep '^[<>]' | sort > > < 6.12.30-android16-5-g5c72e9fabab7-ab13770814 > > < # CONFIG_ENABLE_DEFAULT_TRACERS is not set > > < # CONFIG_FUNCTION_TRACER is not set > > > > > 6.12.30-android16-5-maybe-dirty > > > # CONFIG_FPROBE is not set > > > # CONFIG_FTRACE_RECORD_RECURSION is not set > > > # CONFIG_FTRACE_SORT_STARTUP_TEST is not set > > > # CONFIG_FTRACE_STARTUP_TEST is not set > > > # CONFIG_FTRACE_VALIDATE_RCU_IS_WATCHING is not set > > > # CONFIG_FUNCTION_PROFILER is not set > > > # CONFIG_HID_BPF is not set > > > # CONFIG_KPROBE_EVENTS_ON_NOTRACE is not set > > > # CONFIG_LIVEPATCH is not set > > > # CONFIG_PSTORE_FTRACE is not set > > > CONFIG_BUILDTIME_MCOUNT_SORT=y > > > CONFIG_DYNAMIC_FTRACE=y > > > CONFIG_DYNAMIC_FTRACE_WITH_ARGS=y > > > CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS=y > > > CONFIG_DYNAMIC_FTRACE_WITH_REGS=y > > > CONFIG_FTRACE_MCOUNT_RECORD=y > > > CONFIG_FTRACE_MCOUNT_USE_OBJTOOL=y > > > CONFIG_FUNCTION_GRAPH_RETVAL=y > > > CONFIG_FUNCTION_GRAPH_TRACER=y > > > CONFIG_FUNCTION_TRACER=y > > CONFIG_FUNCTION_TRACER should enable the build with -mnop-mcount or > -mfentry that ends up with nop5 at the function entry > > > > CONFIG_GENERIC_TRACER=y > > > CONFIG_HAVE_FUNCTION_GRAPH_RETVAL=y > > > CONFIG_HAVE_FUNCTION_GRAPH_TRACER=y > > > CONFIG_KPROBES_ON_FTRACE=y > > > CONFIG_TASKS_RUDE_RCU=y > > > > The above config diff makes it work (ie. it no longer fails with EBUSY). > > > > I'm not sure which exact option fixes things > > (I can test if someone has a more specific guess) > > > > However, I've tracked it down via extensive printk debugging to a pair > > of 'bugs'. > > > > (a) there is no 5-byte nop in bpf_lsm hooks in the bad config > > (confirmed via disassembly). > > this can be fixed (worked around) via: > > > > diff --git a/kernel/bpf/bpf_lsm.c b/kernel/bpf/bpf_lsm.c > > index 3bc61628ab25..f38744df79c8 100644 > > --- a/kernel/bpf/bpf_lsm.c > > +++ b/kernel/bpf/bpf_lsm.c > > @@ -21,7 +21,8 @@ > > * function where a BPF program can be attached. > > */ > > #define LSM_HOOK(RET, DEFAULT, NAME, ...) \ > > -noinline RET bpf_lsm_##NAME(__VA_ARGS__) \ > > +noinline __attribute__((patchable_function_entry(5))) \ > > +RET bpf_lsm_##NAME(__VA_ARGS__) \ > > { \ > > return DEFAULT; \ > > } > > > > [this is likely wrong for non-x86_64, and probably should be > > conditional on something, and may be clang specific...] > > we used to do something remotely similar with patchable_function_entry > for bpf_dispatcher [1], but ended up with changing that later [2] > > [1] ceea991a019c bpf: Move bpf_dispatcher function out of ftrace locations > dbe69b299884 bpf: Fix dispatcher patchable function entry to 5 bytes nop > > [2] 18acb7fac22f bpf: Revert ("Fix dispatcher patchable function entry to 5 bytes nop") > c86df29d11df bpf: Convert BPF_DISPATCHER to use static_call() (not ftrace) > > > > > AFAICT the existence of a bpf_lsm hook makes no sense without the nop in it. > > So this is imho a kernel bug. > > yes, all the bpf_lsm_* hooks are there for the lsm bpf program as > attachment points and they attach as bpf tracing program ... and we have > ftrace subsystem that keeps track and manages nop5 at function entry for > tracing purposes.. hence the ftrace dependency > > perhaps we could generate nops for set of bpf_lsm_* functions if FTRACE > is not enabled, but attachment code already depends on ftrace setup as > you found below with the nop5 and I wonder there will be other surprises > > cc Steven > > > > > (b) this results in a *different* (but valid) 5 byte nop then the > > kernel expects (confirmed by disassembly). > > > > 0f 1f 44 00 08 > > vs > > 0f 1f 44 00 00 > > > > It looks to be a compiler (clang I believe) vs tooling (objdump or > > kernel itself?) type of problem. > > [MCOUNT_USE_OBJTOOL makes me think the compiler is being instructed to > > add mcounts that are then replaced with nops, possibly at build time > > via objtool????] > > ftrace initializes that at start, check ftrace_init > > jirka > > > > > Anyway, this can be fixed via: > > > > diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c > > index ccb2f7703c33..e782acbe6ada 100644 > > --- a/arch/x86/net/bpf_jit_comp.c > > +++ b/arch/x86/net/bpf_jit_comp.c > > @@ -575,7 +575,8 @@ static int emit_jump(u8 **pprog, void *func, void *ip) > > static int __bpf_arch_text_poke(void *ip, enum bpf_text_poke_type t, > > void *old_addr, void *new_addr) > > { > > - const u8 *nop_insn = x86_nops[5]; > > + const u8 nop5alt[5] = { 0x0f,0x1f,0x44,0x00,0x08 }; > > + const u8 *nop_insn = nop5alt; > > u8 old_insn[X86_PATCH_SIZE]; > > u8 new_insn[X86_PATCH_SIZE]; > > u8 *prog; > > > > Any thoughts on how to fix this more correctly? > > > > I'm guessing: > > (a) the bpf lsm hooks should be tagged with some 'traceable' macro, > > which should be arch-specific and empty once mcount/fentry or > > something are enabled. > > (b) I'm guessing the x86 poke function needs to accept multiple forms > > of 5 byte pokes. > > > > But I'm not really sure what would be acceptable here in terms of > > *how* to actually implement this. One possible way I see to fix this - on x86-64 at least, is to do something like what 'bpf: Fix dispatcher patchable function entry to 5 bytes nop' did during early (still single threaded) kernel startup and rewrite: 55 push %rbp 48 89 e5 mov %rsp,%rbp 31 c0 xor %eax,%eax 5d pop %rbp with equivalent number of bytes: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) (this is x86_nops[5]) 31 c0 xor %eax,%eax which means later we will find the 5-byte nop we need... Though this does require explicitly listing all the bpf lsm hooks so we can iterate through them... Unfortunately it's x86 specific, and I think/believe (again haven't had time to test) that arm64 has a similar issue. Except there's likely not enough bytes there to play these sorts of games. I did initially think I could do that at run time when we actually try to poke, but that looks hard to do correctly on smp... So even though this would work, I think it would be better if we had some actual way to flag bpf lsm hooks as needing the right number of nops for the arch even with dynamic_ftrace off... (and why not just enable it? Well, I haven't tried pushing that past our security team yet, but it feels it might be an uphill battle)