Re: [PATCH bpf-next v2 1/2] bpf: Reject bpf_timer for PREEMPT_RT

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Sep 08, 2025 at 03:51:00PM -0700, Alexei Starovoitov wrote:
> On Mon, Sep 8, 2025 at 3:42 PM Peilin Ye <yepeilin@xxxxxxxxxx> wrote:
> > Just in case - actually there was a patch that does this:
> >
> > [2] https://lore.kernel.org/bpf/20250905061919.439648-1-yepeilin@xxxxxxxxxx/
> >
> > It was then superseded by the patches you linked [1] above however,
> > since per discussion in [2], "use bpf_mem_alloc() to skip memcg
> > accounting because it can trigger hardlockups" is a workaround instead
> > of a proper fix.
> >
> > I wonder if this new issue on PREEMPT_RT would justify [2] over [1]?
> > IIUC, until kmalloc_nolock() becomes available:
> >
> > [1] (plus Leon's patch here) means no bpf_timer on PREEMPT_RT, but we
> > still have memcg accounting for non-PREEMPT_RT; [2] means no memcg
> > accounting.
> 
> I didn't comment on the above statement earlier, because
> I thought you meant "no memcg accounting _inline_",
> but reading above it sounds that you think that bpf_mem_alloc()
> doesn't do memcg accounting at all ?
> That's incorrect. bpf_mem_alloc() always uses memcg accounting

Ah, I see - kernel/bpf/memalloc.c:alloc_bulk() via irq_work.  Thanks for
the correction!

> , but the usage is nuanced. bpf_global_ma is counted towards root memcg,
> since it's created during boot. While hash map powered by bpf_mem_alloc
> is using memcg of the user that created that map.

- - -
IIUC, this "sleeping function called from invalid context" message on
PREEMPT_RT is because ___slab_alloc() does local_lock_irqsave(), with
IRQ disabled by __bpf_async_init():

        __bpf_spin_lock_irqsave(&async->lock);
        t = async->timer;
        if (t) {
                ret = -EBUSY;
                goto out;
        }

        /* allocate hrtimer via map_kmalloc to use memcg accounting */
        cb = bpf_map_kmalloc_node(map, size, __GFP_HIGH, map->numa_node);

For my understanding, is/how is kmalloc_nolock() going to resolve this?
Patch [3] changes ___slab_alloc() to:

          /* must check again c->slab in case we got preempted and it changed */
 -        local_lock_irqsave(&s->cpu_slab->lock, flags);
 +        local_lock_cpu_slab(s, &flags);

But for PREEMPT_RT, local_lock_cpu_slab() still does
local_lock_irqsave(), and the comment says that we can't call it with
IRQ disabled:

 +         * On PREEMPT_RT an invocation is not possible from IRQ-off or preempt
 +         * disabled context. The lock will always be acquired and if needed it
 +         * block and sleep until the lock is available.

So it seems that we'll still have this "sleeping function called from
invalid context" issue for PREEMPT_RT even if we make __bpf_async_init()
use bpf_mem_alloc() (when the latter uses kmalloc_nolock())?

[3]
[PATCH v3 5/6] slab: Introduce kmalloc_nolock() and kfree_nolock().
https://lore.kernel.org/all/20250716022950.69330-6-alexei.starovoitov@xxxxxxxxx/

Thanks,
Peilin Ye





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux