On Fri, 20 Jun 2025 at 20:44, Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> wrote: > > On Thu, Jun 19, 2025 at 8:11 PM Emil Tsalapatis <emil@xxxxxxxxxxxxxxx> wrote: > > > > Add a new BPF arena kfunc from protecting a range of pages. These pages > > cannot be allocated, either explicitly through bpf_arena_alloc_pages() > > or implicitly through userspace page faults. > > > > Signed-off-by: Emil Tsalapatis <emil@xxxxxxxxxxxxxxx> > > --- > > kernel/bpf/arena.c | 95 ++++++++++++++++++++++++++++++++++++++++++++-- > > 1 file changed, 92 insertions(+), 3 deletions(-) > > > > diff --git a/kernel/bpf/arena.c b/kernel/bpf/arena.c > > index 0d56cea71602..2f9293eb7151 100644 > > --- a/kernel/bpf/arena.c > > +++ b/kernel/bpf/arena.c > > @@ -48,6 +48,7 @@ struct bpf_arena { > > u64 user_vm_end; > > struct vm_struct *kern_vm; > > struct range_tree rt; > > + struct range_tree rt_guard; > > ... > > > } > > @@ -282,6 +298,11 @@ static vm_fault_t arena_vm_fault(struct vm_fault *vmf) > > /* User space requested to segfault when page is not allocated by bpf prog */ > > return VM_FAULT_SIGSEGV; > > > > + /* Make sure the page is not guarded. */ > > + ret = is_range_tree_set(&arena->rt_guard, vmf->pgoff, 1); > > + if (ret) > > + return VM_FAULT_SIGSEGV; > > + > > ret = range_tree_clear(&arena->rt, vmf->pgoff, 1); > > Why complicate things with another tree ? > The logic has to range_tree_clear(&arena->rt, ... anyway > and here check: > is_range_tree_set(&arena->rt, ... > > bpf_arena_guard_pages() won't have EALREADY errors, so be it. > Keeping another range_tree and spending kernel memory > just to produce an error to buggy bpf prog is imo wrong trade off. IIUC the main requirement is reserving a region that cannot be faulted in user space, and cannot be allocated from the BPF side. I would instead add a flag that when set overrides the SIGSEGV/page-in behavior (which can be set globally by a flag on the map). That sounds more generic and potentially useful to pick the behavior on a per-allocation basis instead of making it global. So for specific allocations, we get SEGSEGV instead of paging in memory, while for the rest it's the default based on map's flags. And to prevent anybody else from allocating this range, reserve it ahead of time in the scheduler's init() callback. For normal programs it can be an extra prog run before the program is attached and starts firing. We won't need a new kfunc either.