Re: [PATCH bpf-next v2 1/2] bpf: Reject bpf_timer for PREEMPT_RT

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Sep 9, 2025 at 2:00 AM Peilin Ye <yepeilin@xxxxxxxxxx> wrote:
>
> On Mon, Sep 08, 2025 at 03:51:00PM -0700, Alexei Starovoitov wrote:
> > On Mon, Sep 8, 2025 at 3:42 PM Peilin Ye <yepeilin@xxxxxxxxxx> wrote:
> > > Just in case - actually there was a patch that does this:
> > >
> > > [2] https://lore.kernel.org/bpf/20250905061919.439648-1-yepeilin@xxxxxxxxxx/
> > >
> > > It was then superseded by the patches you linked [1] above however,
> > > since per discussion in [2], "use bpf_mem_alloc() to skip memcg
> > > accounting because it can trigger hardlockups" is a workaround instead
> > > of a proper fix.
> > >
> > > I wonder if this new issue on PREEMPT_RT would justify [2] over [1]?
> > > IIUC, until kmalloc_nolock() becomes available:
> > >
> > > [1] (plus Leon's patch here) means no bpf_timer on PREEMPT_RT, but we
> > > still have memcg accounting for non-PREEMPT_RT; [2] means no memcg
> > > accounting.
> >
> > I didn't comment on the above statement earlier, because
> > I thought you meant "no memcg accounting _inline_",
> > but reading above it sounds that you think that bpf_mem_alloc()
> > doesn't do memcg accounting at all ?
> > That's incorrect. bpf_mem_alloc() always uses memcg accounting
>
> Ah, I see - kernel/bpf/memalloc.c:alloc_bulk() via irq_work.  Thanks for
> the correction!
>
> > , but the usage is nuanced. bpf_global_ma is counted towards root memcg,
> > since it's created during boot. While hash map powered by bpf_mem_alloc
> > is using memcg of the user that created that map.
>
> - - -
> IIUC, this "sleeping function called from invalid context" message on
> PREEMPT_RT is because ___slab_alloc() does local_lock_irqsave(), with
> IRQ disabled by __bpf_async_init():
>
>         __bpf_spin_lock_irqsave(&async->lock);
>         t = async->timer;
>         if (t) {
>                 ret = -EBUSY;
>                 goto out;
>         }
>
>         /* allocate hrtimer via map_kmalloc to use memcg accounting */
>         cb = bpf_map_kmalloc_node(map, size, __GFP_HIGH, map->numa_node);
>
> For my understanding, is/how is kmalloc_nolock() going to resolve this?
> Patch [3] changes ___slab_alloc() to:
>
>           /* must check again c->slab in case we got preempted and it changed */
>  -        local_lock_irqsave(&s->cpu_slab->lock, flags);
>  +        local_lock_cpu_slab(s, &flags);
>
> But for PREEMPT_RT, local_lock_cpu_slab() still does
> local_lock_irqsave(), and the comment says that we can't call it with
> IRQ disabled:
>
>  +         * On PREEMPT_RT an invocation is not possible from IRQ-off or preempt
>  +         * disabled context. The lock will always be acquired and if needed it
>  +         * block and sleep until the lock is available.

Of course. Not saying that s/kmalloc/kmalloc_nolock/ is a silver bullet.
bpf_timer needs other work. __bpf_spin_lock_irqsave() needs to go, etc.





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux