Re: 6.12.30 x86_64 BPF_LSM doesn't work without (?) fentry/mcount config options

Jiri Olsa <olsajiri@xxxxxxxxx> · Mon, 25 Aug 2025 14:03:44 +0200

On Thu, Aug 21, 2025 at 03:14:04PM -0700, Maciej Żenczykowski wrote:
> This is on an Android 6.12 GKI kernel for cuttlefish VM, but the
> Android-ness probably shouldn't matter.
> 
> $ cat config.bad | grep BPF_LSM
> CONFIG_BPF_LSM=y
> 
> Trying to attach bpf_lsm_bpf_prog_load hook fails with -EBUSY (always).
> 
> So I decided to enable function graph with retval tracing to track
> down the source of EBUSY... and it started reliably working.

hi,
I don't really have a solution for you, just few notes below

> 
> $ diff config.bad config.works | egrep '^[<>]' | sort
> < 6.12.30-android16-5-g5c72e9fabab7-ab13770814
> < # CONFIG_ENABLE_DEFAULT_TRACERS is not set
> < # CONFIG_FUNCTION_TRACER is not set
> 
> > 6.12.30-android16-5-maybe-dirty
> > # CONFIG_FPROBE is not set
> > # CONFIG_FTRACE_RECORD_RECURSION is not set
> > # CONFIG_FTRACE_SORT_STARTUP_TEST is not set
> > # CONFIG_FTRACE_STARTUP_TEST is not set
> > # CONFIG_FTRACE_VALIDATE_RCU_IS_WATCHING is not set
> > # CONFIG_FUNCTION_PROFILER is not set
> > # CONFIG_HID_BPF is not set
> > # CONFIG_KPROBE_EVENTS_ON_NOTRACE is not set
> > # CONFIG_LIVEPATCH is not set
> > # CONFIG_PSTORE_FTRACE is not set
> > CONFIG_BUILDTIME_MCOUNT_SORT=y
> > CONFIG_DYNAMIC_FTRACE=y
> > CONFIG_DYNAMIC_FTRACE_WITH_ARGS=y
> > CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS=y
> > CONFIG_DYNAMIC_FTRACE_WITH_REGS=y
> > CONFIG_FTRACE_MCOUNT_RECORD=y
> > CONFIG_FTRACE_MCOUNT_USE_OBJTOOL=y
> > CONFIG_FUNCTION_GRAPH_RETVAL=y
> > CONFIG_FUNCTION_GRAPH_TRACER=y
> > CONFIG_FUNCTION_TRACER=y

CONFIG_FUNCTION_TRACER should enable the build with -mnop-mcount or
-mfentry that ends up with nop5 at the function entry

> > CONFIG_GENERIC_TRACER=y
> > CONFIG_HAVE_FUNCTION_GRAPH_RETVAL=y
> > CONFIG_HAVE_FUNCTION_GRAPH_TRACER=y
> > CONFIG_KPROBES_ON_FTRACE=y
> > CONFIG_TASKS_RUDE_RCU=y
> 
> The above config diff makes it work (ie. it no longer fails with EBUSY).
> 
> I'm not sure which exact option fixes things
> (I can test if someone has a more specific guess)
> 
> However, I've tracked it down via extensive printk debugging to a pair
> of 'bugs'.
> 
> (a) there is no 5-byte nop in bpf_lsm hooks in the bad config
> (confirmed via disassembly).
> this can be fixed (worked around) via:
> 
> diff --git a/kernel/bpf/bpf_lsm.c b/kernel/bpf/bpf_lsm.c
> index 3bc61628ab25..f38744df79c8 100644
> --- a/kernel/bpf/bpf_lsm.c
> +++ b/kernel/bpf/bpf_lsm.c
> @@ -21,7 +21,8 @@
>   * function where a BPF program can be attached.
>   */
>  #define LSM_HOOK(RET, DEFAULT, NAME, ...)      \
> -noinline RET bpf_lsm_##NAME(__VA_ARGS__)       \
> +noinline __attribute__((patchable_function_entry(5))) \
> +RET bpf_lsm_##NAME(__VA_ARGS__) \
>  {                                              \
>         return DEFAULT;                         \
>  }
> 
> [this is likely wrong for non-x86_64, and probably should be
> conditional on something, and may be clang specific...]

we used to do something remotely similar with patchable_function_entry
for bpf_dispatcher [1], but ended up with changing that later [2]

[1] ceea991a019c bpf: Move bpf_dispatcher function out of ftrace locations
    dbe69b299884 bpf: Fix dispatcher patchable function entry to 5 bytes nop

[2] 18acb7fac22f bpf: Revert ("Fix dispatcher patchable function entry to 5 bytes nop")
    c86df29d11df bpf: Convert BPF_DISPATCHER to use static_call() (not ftrace)

> 
> AFAICT the existence of a bpf_lsm hook makes no sense without the nop in it.
> So this is imho a kernel bug.

yes, all the bpf_lsm_* hooks are there for the lsm bpf program as
attachment points and they attach as bpf tracing program ... and we have
ftrace subsystem that keeps track and manages nop5 at function entry for
tracing purposes.. hence the ftrace dependency

perhaps we could generate nops for set of bpf_lsm_* functions if FTRACE
is not enabled, but attachment code already depends on ftrace setup as
you found below with the nop5 and I wonder there will be other surprises

cc Steven

> 
> (b) this results in a *different* (but valid) 5 byte nop then the
> kernel expects (confirmed by disassembly).
> 
> 0f 1f 44 00 08
> vs
> 0f 1f 44 00 00
> 
> It looks to be a compiler (clang I believe) vs tooling (objdump or
> kernel itself?) type of problem.
> [MCOUNT_USE_OBJTOOL makes me think the compiler is being instructed to
> add mcounts that are then replaced with nops, possibly at build time
> via objtool????]

ftrace initializes that at start, check ftrace_init

jirka

> 
> Anyway, this can be fixed via:
> 
> diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
> index ccb2f7703c33..e782acbe6ada 100644
> --- a/arch/x86/net/bpf_jit_comp.c
> +++ b/arch/x86/net/bpf_jit_comp.c
> @@ -575,7 +575,8 @@ static int emit_jump(u8 **pprog, void *func, void *ip)
>  static int __bpf_arch_text_poke(void *ip, enum bpf_text_poke_type t,
>                                 void *old_addr, void *new_addr)
>  {
> -       const u8 *nop_insn = x86_nops[5];
> +       const u8 nop5alt[5] = { 0x0f,0x1f,0x44,0x00,0x08 };
> +       const u8 *nop_insn = nop5alt;
>         u8 old_insn[X86_PATCH_SIZE];
>         u8 new_insn[X86_PATCH_SIZE];
>         u8 *prog;
> 
> Any thoughts on how to fix this more correctly?
> 
> I'm guessing:
> (a) the bpf lsm hooks should be tagged with some 'traceable' macro,
> which should be arch-specific and empty once mcount/fentry or
> something are enabled.
> (b) I'm guessing the x86 poke function needs to accept multiple forms
> of 5 byte pokes.
> 
> But I'm not really sure what would be acceptable here in terms of
> *how* to actually implement this.
> 
> --
> Maciej Zenczykowski, Kernel Networking Developer @ Google
>