On Fri, Jul 4, 2025 at 12:11 PM Kumar Kartikeya Dwivedi <memxor@xxxxxxxxx> wrote: > > On Fri, 4 Jul 2025 at 19:29, Raj Sahu <rjsu26@xxxxxxxxx> wrote: > > > > > > Introduces watchdog based runtime mechanism to terminate > > > > a BPF program. When a BPF program is interrupted by > > > > an watchdog, its registers are are passed onto the bpf_die. > > > > > > > > Inside bpf_die we perform the text_poke and stack walk > > > > to stub helpers/kfunc replace bpf_loop helper if called > > > > inside bpf program. > > > > > > > > Current implementation doesn't handle the termination of > > > > tailcall programs. > > > > > > > > There is a known issue by calling text_poke inside interrupt > > > > context - https://elixir.bootlin.com/linux/v6.15.1/source/kernel/smp.c#L815. > > > > > > I don't have a good idea so far, maybe by deferring work to wq context? > > > Each CPU would need its own context and schedule work there. > > > The problem is that it may not be invoked immediately. > > We will give it a try using wq. We were a bit hesitant in pursuing wq > > earlier because to modify the return address on the stack we would > > want to interrupt the running BPF program and access its stack since > > that's a key part of the design. > > > > Will need some suggestions here on how to achieve that. > > Yeah, this is not trivial, now that I think more about it. > So keep the stack state untouched so you could synchronize with the > callback (spin until it signals us that it's done touching the stack). > I guess we can do it from another CPU, not too bad. > > There's another problem though, wq execution not happening instantly > in time is not a big deal, but it getting interrupted by yet another > program that stalls can set up a cascading chain that leads to lock up > of the machine. > So let's say we have a program that stalls in NMI/IRQ. It might happen > that all CPUs that can service the wq enter this stall. The kthread is > ready to run the wq callback (or in the middle of it) but it may be > indefinitely interrupted. > It seems like this is a more fundamental problem with the non-cloning > approach. We can prevent program execution on the CPU where the wq > callback will be run, but we can also have a case where all CPUs lock > up simultaneously. If we have such bugs that prog in NMI can stall CPU indefinitely they need to be fixed independently of fast-execute. timed may_goto, tailcalls or whatever may need to have different limits when it detects that the prog is running in NMI or with hard irqs disabled. Fast-execute doesn't have to be a universal kill-bpf-prog mechanism that can work in any context. I think fast-execute is for progs that deadlocked in res_spin_lock, faulted arena, or were slow for wrong reasons, but not fatal for the kernel reasons. imo we can rely on schedule_work() and bpf_arch_text_poke() from there. The alternative of clone of all progs and memory waste for a rare case is not appealing. Unless we can detect "dangerous" progs and clone with fast execute only for them, so that the majority of bpf progs stay as single copy.