On Thu, Jul 10, 2025 at 11:36:02AM +0200, Vlastimil Babka wrote: > On 7/9/25 03:53, Alexei Starovoitov wrote: > > From: Alexei Starovoitov <ast@xxxxxxxxxx> > > > > kmalloc_nolock() relies on ability of local_lock to detect the situation > > when it's locked. > > In !PREEMPT_RT local_lock_is_locked() is true only when NMI happened in > > irq saved region that protects _that specific_ per-cpu kmem_cache_cpu. > > In that case retry the operation in a different kmalloc bucket. > > The second attempt will likely succeed, since this cpu locked > > different kmem_cache_cpu. > > > > Similarly, in PREEMPT_RT local_lock_is_locked() returns true when > > per-cpu rt_spin_lock is locked by current task. In this case re-entrance > > into the same kmalloc bucket is unsafe, and kmalloc_nolock() tries > > a different bucket that is most likely is not locked by the current > > task. Though it may be locked by a different task it's safe to > > rt_spin_lock() on it. > > > > Similar to alloc_pages_nolock() the kmalloc_nolock() returns NULL > > immediately if called from hard irq or NMI in PREEMPT_RT. > > > > kfree_nolock() defers freeing to irq_work when local_lock_is_locked() > > and in_nmi() or in PREEMPT_RT. > > > > SLUB_TINY config doesn't use local_lock_is_locked() and relies on > > spin_trylock_irqsave(&n->list_lock) to allocate while kfree_nolock() > > always defers to irq_work. > > > > Signed-off-by: Alexei Starovoitov <ast@xxxxxxxxxx> > > @@ -3911,6 +3953,12 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node, > > void *flush_freelist = c->freelist; > > struct slab *flush_slab = c->slab; > > > > + if (unlikely(!allow_spin)) > > + /* > > + * Reentrant slub cannot take locks > > + * necessary for deactivate_slab() > > + */ > > + return NULL; > > Hm but this is leaking the slab we allocated and have in the "slab" > variable, we need to free it back in that case. But it might be a partial slab taken from the list? Then we need to trylock n->list_lock and if that fails, oh... > > c->slab = NULL; > > c->freelist = NULL; > > c->tid = next_tid(c->tid); -- Cheers, Harry / Hyeonggon