> If we have such bugs that prog in NMI can stall CPU indefinitely > they need to be fixed independently of fast-execute. > timed may_goto, tailcalls or whatever may need to have different > limits when it detects that the prog is running in NMI or with hard irqs > disabled. Fast-execute doesn't have to be a universal kill-bpf-prog > mechanism that can work in any context. I think fast-execute > is for progs that deadlocked in res_spin_lock, faulted arena, > or were slow for wrong reasons, but not fatal for the kernel reasons. > imo we can rely on schedule_work() and bpf_arch_text_poke() from there. > The alternative of clone of all progs and memory waste for a rare case > is not appealing. Unless we can detect "dangerous" progs and > clone with fast execute only for them, so that the majority of bpf progs > stay as single copy. I just want to confirm that we are on the same page here: While the RFC we sent was using prog cloning, Kumar's earlier suggestion of implementing offset tables can avoid the complete cloning process and the associated memory footprint. Is there something else which is concerning here in terms of memory overhead? Regarding the NMI issue, the fast-execute design was meant to take care of stalling in tracing and other task-context based programs running slow for some reason. While I do agree with your point that deadlocks in NMIs should be solved independently, kumar's point of having several BPF programs needing termination, running in hardIRQ, puts us in a fix. What should be the way forward here?