Re: [PATCH bpf-next 1/2] bpf/arena: add bpf_arena_guard_pages kfunc

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 20 Jun 2025 at 20:57, Alexei Starovoitov
<alexei.starovoitov@xxxxxxxxx> wrote:
>
> On Fri, Jun 20, 2025 at 11:52 AM Kumar Kartikeya Dwivedi
> <memxor@xxxxxxxxx> wrote:
> >
> > On Fri, 20 Jun 2025 at 20:44, Alexei Starovoitov
> > <alexei.starovoitov@xxxxxxxxx> wrote:
> > >
> > > On Thu, Jun 19, 2025 at 8:11 PM Emil Tsalapatis <emil@xxxxxxxxxxxxxxx> wrote:
> > > >
> > > > Add a new BPF arena kfunc from protecting a range of pages. These pages
> > > > cannot be allocated, either explicitly through bpf_arena_alloc_pages()
> > > > or implicitly through userspace page faults.
> > > >
> > > > Signed-off-by: Emil Tsalapatis <emil@xxxxxxxxxxxxxxx>
> > > > ---
> > > >  kernel/bpf/arena.c | 95 ++++++++++++++++++++++++++++++++++++++++++++--
> > > >  1 file changed, 92 insertions(+), 3 deletions(-)
> > > >
> > > > diff --git a/kernel/bpf/arena.c b/kernel/bpf/arena.c
> > > > index 0d56cea71602..2f9293eb7151 100644
> > > > --- a/kernel/bpf/arena.c
> > > > +++ b/kernel/bpf/arena.c
> > > > @@ -48,6 +48,7 @@ struct bpf_arena {
> > > >         u64 user_vm_end;
> > > >         struct vm_struct *kern_vm;
> > > >         struct range_tree rt;
> > > > +       struct range_tree rt_guard;
> > >
> > > ...
> > >
> > > >  }
> > > > @@ -282,6 +298,11 @@ static vm_fault_t arena_vm_fault(struct vm_fault *vmf)
> > > >                 /* User space requested to segfault when page is not allocated by bpf prog */
> > > >                 return VM_FAULT_SIGSEGV;
> > > >
> > > > +       /* Make sure the page is not guarded. */
> > > > +       ret = is_range_tree_set(&arena->rt_guard, vmf->pgoff, 1);
> > > > +       if (ret)
> > > > +               return VM_FAULT_SIGSEGV;
> > > > +
> > > >         ret = range_tree_clear(&arena->rt, vmf->pgoff, 1);
> > >
> > > Why complicate things with another tree ?
> > > The logic has to range_tree_clear(&arena->rt, ... anyway
> > > and here check:
> > > is_range_tree_set(&arena->rt, ...
> > >
> > > bpf_arena_guard_pages() won't have EALREADY errors, so be it.
> > > Keeping another range_tree and spending kernel memory
> > > just to produce an error to buggy bpf prog is imo wrong trade off.
> >
> > IIUC the main requirement is reserving a region that cannot be faulted
> > in user space, and cannot be allocated from the BPF side.
> > I would instead add a flag that when set overrides the SIGSEGV/page-in
> > behavior (which can be set globally by a flag on the map).
> > That sounds more generic and potentially useful to pick the behavior
> > on a per-allocation basis instead of making it global.
> > So for specific allocations, we get SEGSEGV instead of paging in
> > memory, while for the rest it's the default based on map's flags.
> > And to prevent anybody else from allocating this range, reserve it
> > ahead of time in the scheduler's init() callback.
> > For normal programs it can be an extra prog run before the program is
> > attached and starts firing.
> > We won't need a new kfunc either.
>
> I'm not following the idea.
> There is already a flag:
>         if (arena->map.map_flags & BPF_F_SEGV_ON_FAULT)
>                 /* User space requested to segfault when page is not
> allocated by bpf prog */
>                 return VM_FAULT_SIGSEGV;

Yeah, but it's global. We may want fault-in for normal memory on
access, but we may want some range in the address space to have
SEGV_ON_FAULT behavior.
It allows some allocations to pick their fault-in/segv behavior
independent of the arena user's choice, which I think is important for
library like code / malloc etc.

The thing described above still wastes some memory because we map pages.
I feel the proper way to do this is how one would do something like
this in user space.
Have pages mapped in a range and remove PROT_READ | PROT_WRITE from
them, making them inaccessible.
Ideally it'd be zero pages but if we cannot do that, wasting some
memory for guard regions may not be too bad, for now.
We need mprotect() like capabilities to change r/w permissions to
create such guard pages, e.g. for a malloc with some debug features /
efence-like protection.





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux