On Tue, Aug 26, 2025 at 7:58 PM Leon Hwang <leon.hwang@xxxxxxxxx> wrote: > > > > On 27/8/25 10:23, Alexei Starovoitov wrote: > > On Tue, Aug 26, 2025 at 7:13 PM Leon Hwang <leon.hwang@xxxxxxxxx> wrote: > >> > >> Hi, > >> > >> I’ve encountered a reproducible deadlock while developing the funcgraph > >> feature for bpfsnoop [0]. > > > > debug it pls. > > It’s quite difficult for me. I’ve tried debugging it but didn’t succeed. > > > Sounds like you're implying that the root cause is in bpf, > > but why do you think so? > > > > You're attaching to things that shouldn't be attached to. > > Like rcu_lockdep_current_cpu_online() > > so effectively you're recursing in that lockdep code. > > See big lock there. It will dead lock for sure. > > If a function that acquires a lock can be traced by a tracing program, > bpfsnoop’s funcgraph will attempt to trace it as well. In such cases, a > deadlock is highly likely to occur. > > With bpfsnoop I try my best to avoid such deadlock issues. But what > about other bpf tracing tools? If they don’t handle this properly, the > kernel is very likely to crash. bpf infra is trying hard not to crash it, but debug kernel is a different category. rcu_read_lock_held() doesn't exist in production kernels. You can propose adding "notrace" for it, but in general that doesn't scale. Same with rcu_lockdep_current_cpu_online(). It probably deserves "notrace" too.