Re: [RFC bpf-next 3/4] bpf: Generating a stubbed version of BPF program for termination

Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> · Wed, 14 May 2025 18:59:09 -0700

On Mon, May 12, 2025 at 10:16 PM Siddharth Chintamaneni
<sidchintamaneni@xxxxxxxxx> wrote:
>
> On Mon, 12 May 2025 at 17:20, Alexei Starovoitov
> <alexei.starovoitov@xxxxxxxxx> wrote:
> >
> > On Mon, May 12, 2025 at 5:07 PM Eduard Zingerman <eddyz87@xxxxxxxxx> wrote:
> > >
> > >
> > > - From verification point of view:
> > >   this function is RET_VOID and is not in
> > >   find_in_skiplist(), patch_generator() would replace its call with a
> > >   dummy. However, a corresponding bpf_spin_unlock() would remain and thus
> > >   bpf_check() will exit with error.
> > >   So, you would need some special version of bpf_check, that collects
> > >   all resources needed for program translation (e.g. maps), but does
> > >   not perform semantic checks.
> > >   Or patch_generator() has to be called for a program that is already
> > >   verified.
> >
> > No. let's not parametrize bpf_check.
> >
> > Here is what I proposed earlier in the thread:
> >
> > the verifier should just remember all places where kfuncs
> > and helpers return _OR_NULL,
> > then when the verification is complete, copy the prog,
> > replaces 'call kfunc/help' with 'call stub',
> > run two JITs, and compare JIT artifacts
> > to make sure IPs match.
> >
>
> This is something that we've experimented with last week
> https://github.com/sidchintamaneni/bpf/commits/bpf_term/v2_exploration/.
> We did the cloning part after `do_misc_fixups` and before
> `fixup_call_args` inside
> bpf_check().

Something like that but it needs to handle both helpers and kfuncs,
and you don't need new fake helpers.
text_poke_bp_batch() doesn't care what insn is being patched.

> > But thinking about it more...
> > I'm not sure any more that it's a good idea to fast execute
> > the program on one cpu and let it continue running as-is on
> > all other cpus including future invocations on this cpu.
> > So far the reasons to terminate bpf program:
> > - timeout in rqspinlock
> > - fault in arena
> > - some future watchdog
> >
>
> Also long running interators, for example -
> https://github.com/sidchintamaneni/os-dev-env/blob/main/bpf-programs-catalog/research/termination/patch_gen_testing/bpf_loop_lr.kern.c
> Eventhough this is just an example, this could be possible when
> iterating through a map which grows unconditionally.

In terms of detection this is a subset of watchdog.
In terms of termination we still need to figure out the best
path forward.
bpf_loop() may or may not be inlined.
If it's still a helper call then we can have per-prog "stop_me" flag,
but it will penalize run-time, and won't really work for
inlined (unless we force inlining logic to consult that flag
as well).
One option is to patch the callback subprog to return 1,
but the callback might not have a branch that returns 1.
Another option is to remember the insn that does:
        /* loop header,
         * if reg_loop_cnt >= reg_loop_max skip the loop body
         */
        insn_buf[cnt++] = BPF_JMP_REG(BPF_JGE, reg_loop_cnt, reg_loop_max, 5);

in inlined bpf_loop() and patch that insn with 'goto +5',
so inlined bpf_loop will terminate quickly.

> > In all cases the program is buggy, so it's safer
> > from kernel pov and from data integrity pov to stop
> > all instances now and prevent future invocations.
> > So I think we should patch the prog text in run-time
> > without cloning.
> >
>
> Yes, this is something that we had in mind:
> 1. Terminate the program on a single CPU
> 2. Terminate the program on all CPUs and de-link it
>
> Single CPU termination could be useful when a BPF program is using a
> per-CPU map and the map on a single CPU grows, causing the iterator to
> take a lot of time.

I think de-link (and detach) is difficult.
The context where an abnormal condition is detected (like watchdog)
may not allow detaching.
So I think replacing nop5 in the prologue with 'goto out'
is better.

>
> > The verifier should prepare an array of patches in
> > text_poke_bp_batch() format and when timeout/fault detected
> > do one call to text_poke_bp_batch() to stub out the whole prog.
> >
> Do you mean creating different versions of patches and applying one of
> them based on the current execution state?

I mean the verifier should prepare the whole batch with all insns
that needs to be patched and apply the whole thing at once
with one call to text_poke_bp_batch() that will affect all cpus.
It doesn't matter where the program was running on this cpu.
It might be running somewhere else on a different cpu.
text_poke_bp_batch() won't be atomic, but it doesn't have to be.
Every cpu might have a different fast-execute path to exit.