Re: [RFC bpf-next 3/4] bpf: Generating a stubbed version of BPF program for termination

Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> · Mon, 12 May 2025 17:20:07 -0700

On Mon, May 12, 2025 at 5:07 PM Eduard Zingerman <eddyz87@xxxxxxxxx> wrote:
>
>
> - From verification point of view:
>   this function is RET_VOID and is not in
>   find_in_skiplist(), patch_generator() would replace its call with a
>   dummy. However, a corresponding bpf_spin_unlock() would remain and thus
>   bpf_check() will exit with error.
>   So, you would need some special version of bpf_check, that collects
>   all resources needed for program translation (e.g. maps), but does
>   not perform semantic checks.
>   Or patch_generator() has to be called for a program that is already
>   verified.

No. let's not parametrize bpf_check.

Here is what I proposed earlier in the thread:

the verifier should just remember all places where kfuncs
and helpers return _OR_NULL,
then when the verification is complete, copy the prog,
replaces 'call kfunc/help' with 'call stub',
run two JITs, and compare JIT artifacts
to make sure IPs match.

But thinking about it more...
I'm not sure any more that it's a good idea to fast execute
the program on one cpu and let it continue running as-is on
all other cpus including future invocations on this cpu.
So far the reasons to terminate bpf program:
- timeout in rqspinlock
- fault in arena
- some future watchdog

In all cases the program is buggy, so it's safer
from kernel pov and from data integrity pov to stop
all instances now and prevent future invocations.
So I think we should patch the prog text in run-time
without cloning.

The verifier should prepare an array of patches in
text_poke_bp_batch() format and when timeout/fault detected
do one call to text_poke_bp_batch() to stub out the whole prog.

At least on x86 we always emit nop5 in the prologue,
so we can patch it with goto exit as well.
Then the prog will be completely gutted.