Re: 6.12.30 x86_64 BPF_LSM doesn't work without (?) fentry/mcount config options

Maciej Żenczykowski <maze@xxxxxxxxxx> · Mon, 25 Aug 2025 20:07:21 -0700

On Mon, Aug 25, 2025 at 5:03 AM Jiri Olsa <olsajiri@xxxxxxxxx> wrote:
>
> On Thu, Aug 21, 2025 at 03:14:04PM -0700, Maciej Żenczykowski wrote:
> > This is on an Android 6.12 GKI kernel for cuttlefish VM, but the
> > Android-ness probably shouldn't matter.
> >
> > $ cat config.bad | grep BPF_LSM
> > CONFIG_BPF_LSM=y
> >
> > Trying to attach bpf_lsm_bpf_prog_load hook fails with -EBUSY (always).
> >
> > So I decided to enable function graph with retval tracing to track
> > down the source of EBUSY... and it started reliably working.
>
> hi,
> I don't really have a solution for you, just few notes below
>
> >
> > $ diff config.bad config.works | egrep '^[<>]' | sort
> > < 6.12.30-android16-5-g5c72e9fabab7-ab13770814
> > < # CONFIG_ENABLE_DEFAULT_TRACERS is not set
> > < # CONFIG_FUNCTION_TRACER is not set
> >
> > > 6.12.30-android16-5-maybe-dirty
> > > # CONFIG_FPROBE is not set
> > > # CONFIG_FTRACE_RECORD_RECURSION is not set
> > > # CONFIG_FTRACE_SORT_STARTUP_TEST is not set
> > > # CONFIG_FTRACE_STARTUP_TEST is not set
> > > # CONFIG_FTRACE_VALIDATE_RCU_IS_WATCHING is not set
> > > # CONFIG_FUNCTION_PROFILER is not set
> > > # CONFIG_HID_BPF is not set
> > > # CONFIG_KPROBE_EVENTS_ON_NOTRACE is not set
> > > # CONFIG_LIVEPATCH is not set
> > > # CONFIG_PSTORE_FTRACE is not set
> > > CONFIG_BUILDTIME_MCOUNT_SORT=y
> > > CONFIG_DYNAMIC_FTRACE=y
> > > CONFIG_DYNAMIC_FTRACE_WITH_ARGS=y
> > > CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS=y
> > > CONFIG_DYNAMIC_FTRACE_WITH_REGS=y
> > > CONFIG_FTRACE_MCOUNT_RECORD=y
> > > CONFIG_FTRACE_MCOUNT_USE_OBJTOOL=y
> > > CONFIG_FUNCTION_GRAPH_RETVAL=y
> > > CONFIG_FUNCTION_GRAPH_TRACER=y
> > > CONFIG_FUNCTION_TRACER=y
>
> CONFIG_FUNCTION_TRACER should enable the build with -mnop-mcount or
> -mfentry that ends up with nop5 at the function entry
>
> > > CONFIG_GENERIC_TRACER=y
> > > CONFIG_HAVE_FUNCTION_GRAPH_RETVAL=y
> > > CONFIG_HAVE_FUNCTION_GRAPH_TRACER=y
> > > CONFIG_KPROBES_ON_FTRACE=y
> > > CONFIG_TASKS_RUDE_RCU=y
> >
> > The above config diff makes it work (ie. it no longer fails with EBUSY).
> >
> > I'm not sure which exact option fixes things
> > (I can test if someone has a more specific guess)
> >
> > However, I've tracked it down via extensive printk debugging to a pair
> > of 'bugs'.
> >
> > (a) there is no 5-byte nop in bpf_lsm hooks in the bad config
> > (confirmed via disassembly).
> > this can be fixed (worked around) via:
> >
> > diff --git a/kernel/bpf/bpf_lsm.c b/kernel/bpf/bpf_lsm.c
> > index 3bc61628ab25..f38744df79c8 100644
> > --- a/kernel/bpf/bpf_lsm.c
> > +++ b/kernel/bpf/bpf_lsm.c
> > @@ -21,7 +21,8 @@
> >   * function where a BPF program can be attached.
> >   */
> >  #define LSM_HOOK(RET, DEFAULT, NAME, ...)      \
> > -noinline RET bpf_lsm_##NAME(__VA_ARGS__)       \
> > +noinline __attribute__((patchable_function_entry(5))) \
> > +RET bpf_lsm_##NAME(__VA_ARGS__) \
> >  {                                              \
> >         return DEFAULT;                         \
> >  }
> >
> > [this is likely wrong for non-x86_64, and probably should be
> > conditional on something, and may be clang specific...]
>
> we used to do something remotely similar with patchable_function_entry
> for bpf_dispatcher [1], but ended up with changing that later [2]
>
> [1] ceea991a019c bpf: Move bpf_dispatcher function out of ftrace locations
>     dbe69b299884 bpf: Fix dispatcher patchable function entry to 5 bytes nop
>
> [2] 18acb7fac22f bpf: Revert ("Fix dispatcher patchable function entry to 5 bytes nop")
>     c86df29d11df bpf: Convert BPF_DISPATCHER to use static_call() (not ftrace)
>
> >
> > AFAICT the existence of a bpf_lsm hook makes no sense without the nop in it.
> > So this is imho a kernel bug.
>
> yes, all the bpf_lsm_* hooks are there for the lsm bpf program as
> attachment points and they attach as bpf tracing program ... and we have
> ftrace subsystem that keeps track and manages nop5 at function entry for
> tracing purposes.. hence the ftrace dependency
>
> perhaps we could generate nops for set of bpf_lsm_* functions if FTRACE
> is not enabled, but attachment code already depends on ftrace setup as
> you found below with the nop5 and I wonder there will be other surprises
>
> cc Steven
>
> >
> > (b) this results in a *different* (but valid) 5 byte nop then the
> > kernel expects (confirmed by disassembly).
> >
> > 0f 1f 44 00 08
> > vs
> > 0f 1f 44 00 00
> >
> > It looks to be a compiler (clang I believe) vs tooling (objdump or
> > kernel itself?) type of problem.
> > [MCOUNT_USE_OBJTOOL makes me think the compiler is being instructed to
> > add mcounts that are then replaced with nops, possibly at build time
> > via objtool????]
>
> ftrace initializes that at start, check ftrace_init
>
> jirka
>
> >
> > Anyway, this can be fixed via:
> >
> > diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
> > index ccb2f7703c33..e782acbe6ada 100644
> > --- a/arch/x86/net/bpf_jit_comp.c
> > +++ b/arch/x86/net/bpf_jit_comp.c
> > @@ -575,7 +575,8 @@ static int emit_jump(u8 **pprog, void *func, void *ip)
> >  static int __bpf_arch_text_poke(void *ip, enum bpf_text_poke_type t,
> >                                 void *old_addr, void *new_addr)
> >  {
> > -       const u8 *nop_insn = x86_nops[5];
> > +       const u8 nop5alt[5] = { 0x0f,0x1f,0x44,0x00,0x08 };
> > +       const u8 *nop_insn = nop5alt;
> >         u8 old_insn[X86_PATCH_SIZE];
> >         u8 new_insn[X86_PATCH_SIZE];
> >         u8 *prog;
> >
> > Any thoughts on how to fix this more correctly?
> >
> > I'm guessing:
> > (a) the bpf lsm hooks should be tagged with some 'traceable' macro,
> > which should be arch-specific and empty once mcount/fentry or
> > something are enabled.
> > (b) I'm guessing the x86 poke function needs to accept multiple forms
> > of 5 byte pokes.
> >
> > But I'm not really sure what would be acceptable here in terms of
> > *how* to actually implement this.

One possible way I see to fix this - on x86-64 at least, is to do something like
what 'bpf: Fix dispatcher patchable function entry to 5 bytes nop' did
during early (still single threaded) kernel startup and rewrite:
      55               push   %rbp
      48 89 e5         mov    %rsp,%rbp
      31 c0            xor    %eax,%eax
      5d               pop    %rbp
 with equivalent number of bytes:
      0f 1f 44 00 00   nopl   0x0(%rax,%rax,1)   (this is x86_nops[5])
      31 c0            xor    %eax,%eax
which means later we will find the 5-byte nop we need...

Though this does require explicitly listing all the bpf lsm hooks so
we can iterate through them...

Unfortunately it's x86 specific, and I think/believe (again haven't
had time to test) that arm64 has a similar issue.
Except there's likely not enough bytes there to play these sorts of games.

I did initially think I could do that at run time when we actually try
to poke, but that looks hard to do correctly on smp...

So even though this would work, I think it would be better if we had
some actual way to flag bpf lsm hooks as needing the right number of
nops for the arch even with dynamic_ftrace off...
(and why not just enable it?  Well, I haven't tried pushing that past
our security team yet, but it feels it might be an uphill battle)