On Mon, 12 May 2025 at 17:20, Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> wrote: > > On Mon, May 12, 2025 at 5:07 PM Eduard Zingerman <eddyz87@xxxxxxxxx> wrote: > > > > > > - From verification point of view: > > this function is RET_VOID and is not in > > find_in_skiplist(), patch_generator() would replace its call with a > > dummy. However, a corresponding bpf_spin_unlock() would remain and thus > > bpf_check() will exit with error. > > So, you would need some special version of bpf_check, that collects > > all resources needed for program translation (e.g. maps), but does > > not perform semantic checks. > > Or patch_generator() has to be called for a program that is already > > verified. > > No. let's not parametrize bpf_check. > > Here is what I proposed earlier in the thread: > > the verifier should just remember all places where kfuncs > and helpers return _OR_NULL, > then when the verification is complete, copy the prog, > replaces 'call kfunc/help' with 'call stub', > run two JITs, and compare JIT artifacts > to make sure IPs match. > This is something that we've experimented with last week https://github.com/sidchintamaneni/bpf/commits/bpf_term/v2_exploration/. We did the cloning part after `do_misc_fixups` and before `fixup_call_args` inside bpf_check(). > But thinking about it more... > I'm not sure any more that it's a good idea to fast execute > the program on one cpu and let it continue running as-is on > all other cpus including future invocations on this cpu. > So far the reasons to terminate bpf program: > - timeout in rqspinlock > - fault in arena > - some future watchdog > Also long running interators, for example - https://github.com/sidchintamaneni/os-dev-env/blob/main/bpf-programs-catalog/research/termination/patch_gen_testing/bpf_loop_lr.kern.c Eventhough this is just an example, this could be possible when iterating through a map which grows unconditionally. > In all cases the program is buggy, so it's safer > from kernel pov and from data integrity pov to stop > all instances now and prevent future invocations. > So I think we should patch the prog text in run-time > without cloning. > Yes, this is something that we had in mind: 1. Terminate the program on a single CPU 2. Terminate the program on all CPUs and de-link it Single CPU termination could be useful when a BPF program is using a per-CPU map and the map on a single CPU grows, causing the iterator to take a lot of time. > The verifier should prepare an array of patches in > text_poke_bp_batch() format and when timeout/fault detected > do one call to text_poke_bp_batch() to stub out the whole prog. > Do you mean creating different versions of patches and applying one of them based on the current execution state? > At least on x86 we always emit nop5 in the prologue, > so we can patch it with goto exit as well. > Then the prog will be completely gutted.