On Fri, Jun 20, 2025 at 11:52 AM Kumar Kartikeya Dwivedi <memxor@xxxxxxxxx> wrote: > > On Fri, 20 Jun 2025 at 20:44, Alexei Starovoitov > <alexei.starovoitov@xxxxxxxxx> wrote: > > > > On Thu, Jun 19, 2025 at 8:11 PM Emil Tsalapatis <emil@xxxxxxxxxxxxxxx> wrote: > > > > > > Add a new BPF arena kfunc from protecting a range of pages. These pages > > > cannot be allocated, either explicitly through bpf_arena_alloc_pages() > > > or implicitly through userspace page faults. > > > > > > Signed-off-by: Emil Tsalapatis <emil@xxxxxxxxxxxxxxx> > > > --- > > > kernel/bpf/arena.c | 95 ++++++++++++++++++++++++++++++++++++++++++++-- > > > 1 file changed, 92 insertions(+), 3 deletions(-) > > > > > > diff --git a/kernel/bpf/arena.c b/kernel/bpf/arena.c > > > index 0d56cea71602..2f9293eb7151 100644 > > > --- a/kernel/bpf/arena.c > > > +++ b/kernel/bpf/arena.c > > > @@ -48,6 +48,7 @@ struct bpf_arena { > > > u64 user_vm_end; > > > struct vm_struct *kern_vm; > > > struct range_tree rt; > > > + struct range_tree rt_guard; > > > > ... > > > > > } > > > @@ -282,6 +298,11 @@ static vm_fault_t arena_vm_fault(struct vm_fault *vmf) > > > /* User space requested to segfault when page is not allocated by bpf prog */ > > > return VM_FAULT_SIGSEGV; > > > > > > + /* Make sure the page is not guarded. */ > > > + ret = is_range_tree_set(&arena->rt_guard, vmf->pgoff, 1); > > > + if (ret) > > > + return VM_FAULT_SIGSEGV; > > > + > > > ret = range_tree_clear(&arena->rt, vmf->pgoff, 1); > > > > Why complicate things with another tree ? > > The logic has to range_tree_clear(&arena->rt, ... anyway > > and here check: > > is_range_tree_set(&arena->rt, ... > > > > bpf_arena_guard_pages() won't have EALREADY errors, so be it. > > Keeping another range_tree and spending kernel memory > > just to produce an error to buggy bpf prog is imo wrong trade off. > > IIUC the main requirement is reserving a region that cannot be faulted > in user space, and cannot be allocated from the BPF side. > I would instead add a flag that when set overrides the SIGSEGV/page-in > behavior (which can be set globally by a flag on the map). > That sounds more generic and potentially useful to pick the behavior > on a per-allocation basis instead of making it global. > So for specific allocations, we get SEGSEGV instead of paging in > memory, while for the rest it's the default based on map's flags. > And to prevent anybody else from allocating this range, reserve it > ahead of time in the scheduler's init() callback. > For normal programs it can be an extra prog run before the program is > attached and starts firing. > We won't need a new kfunc either. I'm not following the idea. There is already a flag: if (arena->map.map_flags & BPF_F_SEGV_ON_FAULT) /* User space requested to segfault when page is not allocated by bpf prog */ return VM_FAULT_SIGSEGV;