On Mon, Sep 08, 2025 at 03:51:00PM -0700, Alexei Starovoitov wrote: > On Mon, Sep 8, 2025 at 3:42 PM Peilin Ye <yepeilin@xxxxxxxxxx> wrote: > > Just in case - actually there was a patch that does this: > > > > [2] https://lore.kernel.org/bpf/20250905061919.439648-1-yepeilin@xxxxxxxxxx/ > > > > It was then superseded by the patches you linked [1] above however, > > since per discussion in [2], "use bpf_mem_alloc() to skip memcg > > accounting because it can trigger hardlockups" is a workaround instead > > of a proper fix. > > > > I wonder if this new issue on PREEMPT_RT would justify [2] over [1]? > > IIUC, until kmalloc_nolock() becomes available: > > > > [1] (plus Leon's patch here) means no bpf_timer on PREEMPT_RT, but we > > still have memcg accounting for non-PREEMPT_RT; [2] means no memcg > > accounting. > > I didn't comment on the above statement earlier, because > I thought you meant "no memcg accounting _inline_", > but reading above it sounds that you think that bpf_mem_alloc() > doesn't do memcg accounting at all ? > That's incorrect. bpf_mem_alloc() always uses memcg accounting Ah, I see - kernel/bpf/memalloc.c:alloc_bulk() via irq_work. Thanks for the correction! > , but the usage is nuanced. bpf_global_ma is counted towards root memcg, > since it's created during boot. While hash map powered by bpf_mem_alloc > is using memcg of the user that created that map. - - - IIUC, this "sleeping function called from invalid context" message on PREEMPT_RT is because ___slab_alloc() does local_lock_irqsave(), with IRQ disabled by __bpf_async_init(): __bpf_spin_lock_irqsave(&async->lock); t = async->timer; if (t) { ret = -EBUSY; goto out; } /* allocate hrtimer via map_kmalloc to use memcg accounting */ cb = bpf_map_kmalloc_node(map, size, __GFP_HIGH, map->numa_node); For my understanding, is/how is kmalloc_nolock() going to resolve this? Patch [3] changes ___slab_alloc() to: /* must check again c->slab in case we got preempted and it changed */ - local_lock_irqsave(&s->cpu_slab->lock, flags); + local_lock_cpu_slab(s, &flags); But for PREEMPT_RT, local_lock_cpu_slab() still does local_lock_irqsave(), and the comment says that we can't call it with IRQ disabled: + * On PREEMPT_RT an invocation is not possible from IRQ-off or preempt + * disabled context. The lock will always be acquired and if needed it + * block and sleep until the lock is available. So it seems that we'll still have this "sleeping function called from invalid context" issue for PREEMPT_RT even if we make __bpf_async_init() use bpf_mem_alloc() (when the latter uses kmalloc_nolock())? [3] [PATCH v3 5/6] slab: Introduce kmalloc_nolock() and kfree_nolock(). https://lore.kernel.org/all/20250716022950.69330-6-alexei.starovoitov@xxxxxxxxx/ Thanks, Peilin Ye