Re: [BUG] Deadlock triggered by bpfsnoop funcgraph feature

"Paul E. McKenney" <paulmck@xxxxxxxxxx> · Thu, 28 Aug 2025 04:50:19 -0700

On Thu, Aug 28, 2025 at 10:40:47AM +0800, Leon Hwang wrote:
> On 28/8/25 08:42, Alexei Starovoitov wrote:
> > On Tue, Aug 26, 2025 at 7:58 PM Leon Hwang <leon.hwang@xxxxxxxxx> wrote:
> >> On 27/8/25 10:23, Alexei Starovoitov wrote:
> >>> On Tue, Aug 26, 2025 at 7:13 PM Leon Hwang <leon.hwang@xxxxxxxxx> wrote:
> >>>>
> >>>> Hi,
> >>>>
> >>>> I’ve encountered a reproducible deadlock while developing the funcgraph
> >>>> feature for bpfsnoop [0].
> >>>
> >>> debug it pls.
> >>
> >> It’s quite difficult for me. I’ve tried debugging it but didn’t succeed.
> >>
> >>> Sounds like you're implying that the root cause is in bpf,
> >>> but why do you think so?
> >>>
> >>> You're attaching to things that shouldn't be attached to.
> >>> Like rcu_lockdep_current_cpu_online()
> >>> so effectively you're recursing in that lockdep code.
> >>> See big lock there. It will dead lock for sure.
> >>
> >> If a function that acquires a lock can be traced by a tracing program,
> >> bpfsnoop’s funcgraph will attempt to trace it as well. In such cases, a
> >> deadlock is highly likely to occur.
> >>
> >> With bpfsnoop I try my best to avoid such deadlock issues. But what
> >> about other bpf tracing tools? If they don’t handle this properly, the
> >> kernel is very likely to crash.
> > 
> > bpf infra is trying hard not to crash it, but debug kernel is a different
> > category. rcu_read_lock_held() doesn't exist in production kernels.
> > You can propose adding "notrace" for it, but in general that doesn't scale.
> > Same with rcu_lockdep_current_cpu_online().
> > It probably deserves "notrace" too.
> 
> Indeed, it doesn't scale.
> 
> When I run
> ./bpfsnoop -k "htab_*_elem" --output-fgraph --fgraph-debug
> --fgraph-exclude
> 'rcu_read_lock_*held,rcu_lockdep_current_cpu_online,*raw_spin_*lock*,kvfree,show_stack,put_task_stack',
> the kernel doesn’t panic, but the OS eventually stalls and becomes
> unresponsive to key presses.
> 
> It seems preferable to avoid running BPF programs continuously in such
> cases.

Agreed, when adding code to the Linux kernel, whether via a patch, via
a BPF program, or by whatever other means, you are taking responsibility
for the speed, scalability, and latency effects of that code.

Nevertheless, I am happy to add a few "notrace" modifiers
if needed.  Do you guys need them for rcu_read_lock_held() and
rcu_lockdep_current_cpu_online()?

							Thanx, Paul