Re: [RFC bpf-next v2 0/4] bpf: Fast-Path approach for BPF program

Kumar Kartikeya Dwivedi <memxor@xxxxxxxxx> · Mon, 30 Jun 2025 15:03:32 +0200

On Sat, 14 Jun 2025 at 08:41, Siddharth Chintamaneni
<sidchintamaneni@xxxxxxxxx> wrote:
>
> This is RFC v2 of
>         https://lore.kernel.org/bpf/20250420105524.2115690-1-rjsu26@xxxxxxxxx/
>
> Change since v1:
> - Patch generation has been moved after verification and before JIT.
>         - Now patch generation handles both helpers and kfuncs.
>         - Sanity check on original prog and patch prog after JIT.
> - Runtime termination handler is now global termination mechanism using
>   text_poke.
> - Termination is triggered by watchdog timer.
>
> TODO:
> - Termination support for tailcall programs.
> - Fix issue caused by warning in runtime termination handler due to
>   https://elixir.bootlin.com/linux/v6.15.1/source/kernel/smp.c#L815
> - Free memory for patch progs related fields.
> - Include selftests covering more cases such as BPF program nesting.

Thanks for sharing the new version. I think there's a few
implementation issues that need to be sorted out, on a high-level.
- Move arch-specific bits into arch/x86 and ensure all this is not
invoked / disabled on other architectures.
- Drop the hrtimer based termination enforcement.
- Drop extra prog cloning, since I don't see the need anymore.
- Add an early-short circuit by padding the entry with nop5 and then
changing it to return early.
- Other comments left inline.

Code patching when not in_task() looks problematic, I will think more
but I guess deferring work to some wq context is probably the best
option.
We can potentially be in an NMI context as well when we stall, so
either it needs to be made safe such that we can invoke it from any
context, or we defer work.
Right now, the invocation happens from a timer callback, but we may
synchronously invoke fast-execute termination request for cond_break
timeouts or rqspinlock deadlock.

Otherwise it might be simpler to just swap the return address with a
copy of a patched/cloned program on the local CPU, and replace
prog_func globally. But we discarded that approach.
Doesn't feel right to go back there just because we can't modify text
from any context.

In terms of design we still need to figure out how to handle tail
calls and extension programs.
Tail calls are relatively easier, since they can potentially fail.
Extension programs with the current approach become tricky.
If you had cloned the program during verification and replaced the
original with it at runtime, then any extensions attached to the
program do not get invoked.
Now, since we patch the original program directly, it's possible that
we still enter an extension program and end up stalling there again.
I guess we will end up patching the extension program next, but it
would be nice to verify that this works as expected.
Other options are removing the extension program by patching it out
and replacing with original global function target which would have
been patched.

>
>  include/linux/bpf.h                           |  19 +-
>  include/linux/bpf_verifier.h                  |   4 +
>  include/linux/filter.h                        |  43 ++-
>  kernel/bpf/core.c                             |  64 +++++
>  kernel/bpf/syscall.c                          | 225 ++++++++++++++++
>  kernel/bpf/trampoline.c                       |   5 +
>  kernel/bpf/verifier.c                         | 245 +++++++++++++++++-
>  .../bpf/prog_tests/bpf_termination.c          |  39 +++
>  .../selftests/bpf/progs/bpf_termination.c     |  38 +++
>  9 files changed, 674 insertions(+), 8 deletions(-)
>  create mode 100644 tools/testing/selftests/bpf/prog_tests/bpf_termination.c
>  create mode 100644 tools/testing/selftests/bpf/progs/bpf_termination.c
>
> --
> 2.43.0
>