On Fri, 20 Jun 2025 at 20:57, Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> wrote: > > On Fri, Jun 20, 2025 at 11:52 AM Kumar Kartikeya Dwivedi > <memxor@xxxxxxxxx> wrote: > > > > On Fri, 20 Jun 2025 at 20:44, Alexei Starovoitov > > <alexei.starovoitov@xxxxxxxxx> wrote: > > > > > > On Thu, Jun 19, 2025 at 8:11 PM Emil Tsalapatis <emil@xxxxxxxxxxxxxxx> wrote: > > > > > > > > Add a new BPF arena kfunc from protecting a range of pages. These pages > > > > cannot be allocated, either explicitly through bpf_arena_alloc_pages() > > > > or implicitly through userspace page faults. > > > > > > > > Signed-off-by: Emil Tsalapatis <emil@xxxxxxxxxxxxxxx> > > > > --- > > > > kernel/bpf/arena.c | 95 ++++++++++++++++++++++++++++++++++++++++++++-- > > > > 1 file changed, 92 insertions(+), 3 deletions(-) > > > > > > > > diff --git a/kernel/bpf/arena.c b/kernel/bpf/arena.c > > > > index 0d56cea71602..2f9293eb7151 100644 > > > > --- a/kernel/bpf/arena.c > > > > +++ b/kernel/bpf/arena.c > > > > @@ -48,6 +48,7 @@ struct bpf_arena { > > > > u64 user_vm_end; > > > > struct vm_struct *kern_vm; > > > > struct range_tree rt; > > > > + struct range_tree rt_guard; > > > > > > ... > > > > > > > } > > > > @@ -282,6 +298,11 @@ static vm_fault_t arena_vm_fault(struct vm_fault *vmf) > > > > /* User space requested to segfault when page is not allocated by bpf prog */ > > > > return VM_FAULT_SIGSEGV; > > > > > > > > + /* Make sure the page is not guarded. */ > > > > + ret = is_range_tree_set(&arena->rt_guard, vmf->pgoff, 1); > > > > + if (ret) > > > > + return VM_FAULT_SIGSEGV; > > > > + > > > > ret = range_tree_clear(&arena->rt, vmf->pgoff, 1); > > > > > > Why complicate things with another tree ? > > > The logic has to range_tree_clear(&arena->rt, ... anyway > > > and here check: > > > is_range_tree_set(&arena->rt, ... > > > > > > bpf_arena_guard_pages() won't have EALREADY errors, so be it. > > > Keeping another range_tree and spending kernel memory > > > just to produce an error to buggy bpf prog is imo wrong trade off. > > > > IIUC the main requirement is reserving a region that cannot be faulted > > in user space, and cannot be allocated from the BPF side. > > I would instead add a flag that when set overrides the SIGSEGV/page-in > > behavior (which can be set globally by a flag on the map). > > That sounds more generic and potentially useful to pick the behavior > > on a per-allocation basis instead of making it global. > > So for specific allocations, we get SEGSEGV instead of paging in > > memory, while for the rest it's the default based on map's flags. > > And to prevent anybody else from allocating this range, reserve it > > ahead of time in the scheduler's init() callback. > > For normal programs it can be an extra prog run before the program is > > attached and starts firing. > > We won't need a new kfunc either. > > I'm not following the idea. > There is already a flag: > if (arena->map.map_flags & BPF_F_SEGV_ON_FAULT) > /* User space requested to segfault when page is not > allocated by bpf prog */ > return VM_FAULT_SIGSEGV; Yeah, but it's global. We may want fault-in for normal memory on access, but we may want some range in the address space to have SEGV_ON_FAULT behavior. It allows some allocations to pick their fault-in/segv behavior independent of the arena user's choice, which I think is important for library like code / malloc etc. The thing described above still wastes some memory because we map pages. I feel the proper way to do this is how one would do something like this in user space. Have pages mapped in a range and remove PROT_READ | PROT_WRITE from them, making them inaccessible. Ideally it'd be zero pages but if we cannot do that, wasting some memory for guard regions may not be too bad, for now. We need mprotect() like capabilities to change r/w permissions to create such guard pages, e.g. for a malloc with some debug features / efence-like protection.