Re: [PATCH bpf-next 1/2] bpf/arena: add bpf_arena_guard_pages kfunc

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Jun 20, 2025 at 12:06 PM Kumar Kartikeya Dwivedi
<memxor@xxxxxxxxx> wrote:
>
> On Fri, 20 Jun 2025 at 20:57, Alexei Starovoitov
> <alexei.starovoitov@xxxxxxxxx> wrote:
> >
> > On Fri, Jun 20, 2025 at 11:52 AM Kumar Kartikeya Dwivedi
> > <memxor@xxxxxxxxx> wrote:
> > >
> > > On Fri, 20 Jun 2025 at 20:44, Alexei Starovoitov
> > > <alexei.starovoitov@xxxxxxxxx> wrote:
> > > >
> > > > On Thu, Jun 19, 2025 at 8:11 PM Emil Tsalapatis <emil@xxxxxxxxxxxxxxx> wrote:
> > > > >
> > > > > Add a new BPF arena kfunc from protecting a range of pages. These pages
> > > > > cannot be allocated, either explicitly through bpf_arena_alloc_pages()
> > > > > or implicitly through userspace page faults.
> > > > >
> > > > > Signed-off-by: Emil Tsalapatis <emil@xxxxxxxxxxxxxxx>
> > > > > ---
> > > > >  kernel/bpf/arena.c | 95 ++++++++++++++++++++++++++++++++++++++++++++--
> > > > >  1 file changed, 92 insertions(+), 3 deletions(-)
> > > > >
> > > > > diff --git a/kernel/bpf/arena.c b/kernel/bpf/arena.c
> > > > > index 0d56cea71602..2f9293eb7151 100644
> > > > > --- a/kernel/bpf/arena.c
> > > > > +++ b/kernel/bpf/arena.c
> > > > > @@ -48,6 +48,7 @@ struct bpf_arena {
> > > > >         u64 user_vm_end;
> > > > >         struct vm_struct *kern_vm;
> > > > >         struct range_tree rt;
> > > > > +       struct range_tree rt_guard;
> > > >
> > > > ...
> > > >
> > > > >  }
> > > > > @@ -282,6 +298,11 @@ static vm_fault_t arena_vm_fault(struct vm_fault *vmf)
> > > > >                 /* User space requested to segfault when page is not allocated by bpf prog */
> > > > >                 return VM_FAULT_SIGSEGV;
> > > > >
> > > > > +       /* Make sure the page is not guarded. */
> > > > > +       ret = is_range_tree_set(&arena->rt_guard, vmf->pgoff, 1);
> > > > > +       if (ret)
> > > > > +               return VM_FAULT_SIGSEGV;
> > > > > +
> > > > >         ret = range_tree_clear(&arena->rt, vmf->pgoff, 1);
> > > >
> > > > Why complicate things with another tree ?
> > > > The logic has to range_tree_clear(&arena->rt, ... anyway
> > > > and here check:
> > > > is_range_tree_set(&arena->rt, ...
> > > >
> > > > bpf_arena_guard_pages() won't have EALREADY errors, so be it.
> > > > Keeping another range_tree and spending kernel memory
> > > > just to produce an error to buggy bpf prog is imo wrong trade off.
> > >
> > > IIUC the main requirement is reserving a region that cannot be faulted
> > > in user space, and cannot be allocated from the BPF side.
> > > I would instead add a flag that when set overrides the SIGSEGV/page-in
> > > behavior (which can be set globally by a flag on the map).
> > > That sounds more generic and potentially useful to pick the behavior
> > > on a per-allocation basis instead of making it global.
> > > So for specific allocations, we get SEGSEGV instead of paging in
> > > memory, while for the rest it's the default based on map's flags.
> > > And to prevent anybody else from allocating this range, reserve it
> > > ahead of time in the scheduler's init() callback.
> > > For normal programs it can be an extra prog run before the program is
> > > attached and starts firing.
> > > We won't need a new kfunc either.
> >
> > I'm not following the idea.
> > There is already a flag:
> >         if (arena->map.map_flags & BPF_F_SEGV_ON_FAULT)
> >                 /* User space requested to segfault when page is not
> > allocated by bpf prog */
> >                 return VM_FAULT_SIGSEGV;
>
> Yeah, but it's global. We may want fault-in for normal memory on
> access, but we may want some range in the address space to have
> SEGV_ON_FAULT behavior.
> It allows some allocations to pick their fault-in/segv behavior
> independent of the arena user's choice, which I think is important for
> library like code / malloc etc.

You mean to augment range_tree with extra flags per range ?
I don't like it. So far no one has used BPF_F_SEGV_ON_FAULT.
Making it more granular won't make it more useful.

> The thing described above still wastes some memory because we map pages.
> I feel the proper way to do this is how one would do something like
> this in user space.
> Have pages mapped in a range and remove PROT_READ | PROT_WRITE from
> them, making them inaccessible.
> Ideally it'd be zero pages but if we cannot do that, wasting some
> memory for guard regions may not be too bad, for now.
> We need mprotect() like capabilities to change r/w permissions to
> create such guard pages, e.g. for a malloc with some debug features /
> efence-like protection.

I don't see the point of all that.
bpf_arena_guard_pages() that only reserve the range without
allocating the pages makes sense to me.
I think bpf_arena_reserve_range() would be a better name for such api.
Since actual pages are not allocated.
"guard pages" typically means actual pages are allocated and populated
with some pattern. Here it's not the case.





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux