6.12.30 x86_64 BPF_LSM doesn't work without (?) fentry/mcount config options

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



This is on an Android 6.12 GKI kernel for cuttlefish VM, but the
Android-ness probably shouldn't matter.

$ cat config.bad | grep BPF_LSM
CONFIG_BPF_LSM=y

Trying to attach bpf_lsm_bpf_prog_load hook fails with -EBUSY (always).

So I decided to enable function graph with retval tracing to track
down the source of EBUSY... and it started reliably working.

$ diff config.bad config.works | egrep '^[<>]' | sort
< 6.12.30-android16-5-g5c72e9fabab7-ab13770814
< # CONFIG_ENABLE_DEFAULT_TRACERS is not set
< # CONFIG_FUNCTION_TRACER is not set

> 6.12.30-android16-5-maybe-dirty
> # CONFIG_FPROBE is not set
> # CONFIG_FTRACE_RECORD_RECURSION is not set
> # CONFIG_FTRACE_SORT_STARTUP_TEST is not set
> # CONFIG_FTRACE_STARTUP_TEST is not set
> # CONFIG_FTRACE_VALIDATE_RCU_IS_WATCHING is not set
> # CONFIG_FUNCTION_PROFILER is not set
> # CONFIG_HID_BPF is not set
> # CONFIG_KPROBE_EVENTS_ON_NOTRACE is not set
> # CONFIG_LIVEPATCH is not set
> # CONFIG_PSTORE_FTRACE is not set
> CONFIG_BUILDTIME_MCOUNT_SORT=y
> CONFIG_DYNAMIC_FTRACE=y
> CONFIG_DYNAMIC_FTRACE_WITH_ARGS=y
> CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS=y
> CONFIG_DYNAMIC_FTRACE_WITH_REGS=y
> CONFIG_FTRACE_MCOUNT_RECORD=y
> CONFIG_FTRACE_MCOUNT_USE_OBJTOOL=y
> CONFIG_FUNCTION_GRAPH_RETVAL=y
> CONFIG_FUNCTION_GRAPH_TRACER=y
> CONFIG_FUNCTION_TRACER=y
> CONFIG_GENERIC_TRACER=y
> CONFIG_HAVE_FUNCTION_GRAPH_RETVAL=y
> CONFIG_HAVE_FUNCTION_GRAPH_TRACER=y
> CONFIG_KPROBES_ON_FTRACE=y
> CONFIG_TASKS_RUDE_RCU=y

The above config diff makes it work (ie. it no longer fails with EBUSY).

I'm not sure which exact option fixes things
(I can test if someone has a more specific guess)

However, I've tracked it down via extensive printk debugging to a pair
of 'bugs'.

(a) there is no 5-byte nop in bpf_lsm hooks in the bad config
(confirmed via disassembly).
this can be fixed (worked around) via:

diff --git a/kernel/bpf/bpf_lsm.c b/kernel/bpf/bpf_lsm.c
index 3bc61628ab25..f38744df79c8 100644
--- a/kernel/bpf/bpf_lsm.c
+++ b/kernel/bpf/bpf_lsm.c
@@ -21,7 +21,8 @@
  * function where a BPF program can be attached.
  */
 #define LSM_HOOK(RET, DEFAULT, NAME, ...)      \
-noinline RET bpf_lsm_##NAME(__VA_ARGS__)       \
+noinline __attribute__((patchable_function_entry(5))) \
+RET bpf_lsm_##NAME(__VA_ARGS__) \
 {                                              \
        return DEFAULT;                         \
 }

[this is likely wrong for non-x86_64, and probably should be
conditional on something, and may be clang specific...]

AFAICT the existence of a bpf_lsm hook makes no sense without the nop in it.
So this is imho a kernel bug.

(b) this results in a *different* (but valid) 5 byte nop then the
kernel expects (confirmed by disassembly).

0f 1f 44 00 08
vs
0f 1f 44 00 00

It looks to be a compiler (clang I believe) vs tooling (objdump or
kernel itself?) type of problem.
[MCOUNT_USE_OBJTOOL makes me think the compiler is being instructed to
add mcounts that are then replaced with nops, possibly at build time
via objtool????]

Anyway, this can be fixed via:

diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index ccb2f7703c33..e782acbe6ada 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -575,7 +575,8 @@ static int emit_jump(u8 **pprog, void *func, void *ip)
 static int __bpf_arch_text_poke(void *ip, enum bpf_text_poke_type t,
                                void *old_addr, void *new_addr)
 {
-       const u8 *nop_insn = x86_nops[5];
+       const u8 nop5alt[5] = { 0x0f,0x1f,0x44,0x00,0x08 };
+       const u8 *nop_insn = nop5alt;
        u8 old_insn[X86_PATCH_SIZE];
        u8 new_insn[X86_PATCH_SIZE];
        u8 *prog;

Any thoughts on how to fix this more correctly?

I'm guessing:
(a) the bpf lsm hooks should be tagged with some 'traceable' macro,
which should be arch-specific and empty once mcount/fentry or
something are enabled.
(b) I'm guessing the x86 poke function needs to accept multiple forms
of 5 byte pokes.

But I'm not really sure what would be acceptable here in terms of
*how* to actually implement this.

--
Maciej Zenczykowski, Kernel Networking Developer @ Google




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux