Re: [PATCH] bpf: Fix bpf_prog nested call in trace_mmap_lock_acquire_returned

Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> · Mon, 12 May 2025 16:59:45 -0700

On Mon, May 12, 2025 at 7:59 AM Tao Chen <chen.dylane@xxxxxxxxx> wrote:
>
> syzkaller reported an issue:
>
>  bpf_prog_ec3b2eefa702d8d3+0x43/0x47
>  bpf_dispatcher_nop_func include/linux/bpf.h:1316 [inline]
>  __bpf_prog_run include/linux/filter.h:718 [inline]
>  bpf_prog_run include/linux/filter.h:725 [inline]
>  __bpf_trace_run kernel/trace/bpf_trace.c:2363 [inline]
>  bpf_trace_run3+0x23f/0x5a0 kernel/trace/bpf_trace.c:2405
>  __bpf_trace_mmap_lock_acquire_returned+0xfc/0x140 include/trace/events/mmap_lock.h:47
>  __traceiter_mmap_lock_acquire_returned+0x79/0xc0 include/trace/events/mmap_lock.h:47
>  __do_trace_mmap_lock_acquire_returned include/trace/events/mmap_lock.h:47 [inline]
>  trace_mmap_lock_acquire_returned include/trace/events/mmap_lock.h:47 [inline]
>  __mmap_lock_do_trace_acquire_returned+0x138/0x1f0 mm/mmap_lock.c:35
>  __mmap_lock_trace_acquire_returned include/linux/mmap_lock.h:36 [inline]
>  mmap_read_trylock include/linux/mmap_lock.h:204 [inline]
>  stack_map_get_build_id_offset+0x535/0x6f0 kernel/bpf/stackmap.c:157
>  __bpf_get_stack+0x307/0xa10 kernel/bpf/stackmap.c:483
>  ____bpf_get_stack kernel/bpf/stackmap.c:499 [inline]
>  bpf_get_stack+0x32/0x40 kernel/bpf/stackmap.c:496
>  ____bpf_get_stack_raw_tp kernel/trace/bpf_trace.c:1941 [inline]
>  bpf_get_stack_raw_tp+0x124/0x160 kernel/trace/bpf_trace.c:1931
>  bpf_prog_ec3b2eefa702d8d3+0x43/0x47
>  bpf_dispatcher_nop_func include/linux/bpf.h:1316 [inline]
>  __bpf_prog_run include/linux/filter.h:718 [inline]
>  bpf_prog_run include/linux/filter.h:725 [inline]
>  __bpf_trace_run kernel/trace/bpf_trace.c:2363 [inline]
>  bpf_trace_run3+0x23f/0x5a0 kernel/trace/bpf_trace.c:2405
>  __bpf_trace_mmap_lock_acquire_returned+0xfc/0x140 include/trace/events/mmap_lock.h:47
>  __traceiter_mmap_lock_acquire_returned+0x79/0xc0 include/trace/events/mmap_lock.h:47
>  __do_trace_mmap_lock_acquire_returned include/trace/events/mmap_lock.h:47 [inline]
>  trace_mmap_lock_acquire_returned include/trace/events/mmap_lock.h:47 [inline]
>  __mmap_lock_do_trace_acquire_returned+0x138/0x1f0 mm/mmap_lock.c:35
>  __mmap_lock_trace_acquire_returned include/linux/mmap_lock.h:36 [inline]
>  mmap_read_lock include/linux/mmap_lock.h:185 [inline]
>  exit_mm kernel/exit.c:565 [inline]
>  do_exit+0xf72/0x2c30 kernel/exit.c:940
>  do_group_exit+0xd3/0x2a0 kernel/exit.c:1102
>  __do_sys_exit_group kernel/exit.c:1113 [inline]
>  __se_sys_exit_group kernel/exit.c:1111 [inline]
>  __x64_sys_exit_group+0x3e/0x50 kernel/exit.c:1111
>  x64_sys_call+0x1530/0x1730 arch/x86/include/generated/asm/syscalls_64.h:232
>  do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
>  do_syscall_64+0xcd/0x260 arch/x86/entry/syscall_64.c:94
>  entry_SYSCALL_64_after_hwframe+0x77/0x7f
>
> mmap_read_trylock is used in stack_map_get_build_id_offset, if user
> wants to trace trace_mmap_lock_acquire_returned tracepoint and get user
> stack in the bpf_prog, it will call trace_mmap_lock_acquire_returned
> again in the bpf_get_stack, which will lead to a nested call relationship.
>
> Reported-by: syzbot+45b0c89a0fc7ae8dbadc@xxxxxxxxxxxxxxxxxxxxxxxxx
> Closes: https://lore.kernel.org/bpf/8bc2554d-1052-4922-8832-e0078a033e1d@xxxxxxxxx
> Fixes: 2f1aaf3ea666 ("bpf, mm: Fix lockdep warning triggered by stack_map_get_build_id_offset()")
> Signed-off-by: Tao Chen <chen.dylane@xxxxxxxxx>
> ---
>  kernel/bpf/stackmap.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
> index 3615c06b7dfa..eec51f069028 100644
> --- a/kernel/bpf/stackmap.c
> +++ b/kernel/bpf/stackmap.c
> @@ -130,6 +130,10 @@ static int fetch_build_id(struct vm_area_struct *vma, unsigned char *build_id, b
>                          : build_id_parse_nofault(vma, build_id, NULL);
>  }
>
> +static inline bool mmap_read_trylock_no_trace(struct mm_struct *mm)
> +{
> +       return down_read_trylock(&mm->mmap_lock) != 0;
> +}
>  /*
>   * Expects all id_offs[i].ip values to be set to correct initial IPs.
>   * They will be subsequently:
> @@ -154,7 +158,7 @@ static void stack_map_get_build_id_offset(struct bpf_stack_build_id *id_offs,
>          * build_id.
>          */
>         if (!user || !current || !current->mm || irq_work_busy ||
> -           !mmap_read_trylock(current->mm)) {
> +           !mmap_read_trylock_no_trace(current->mm)) {

This is not a fix.
It doesn't address the issue.
Since syzbot managed to craft such corner case,
let's remove WARN_ON_ONCE from get_bpf_raw_tp_regs() for now.

In the long run we may consider adding a per-tracepoint
recursion count for particularly dangerous tracepoints like this one,
but let's not do it just yet.
Removing WARN_ON_ONCE should do it.

pw-bot: cr