On 7/16/25 04:29, Alexei Starovoitov wrote: > From: Alexei Starovoitov <ast@xxxxxxxxxx> > > Since kmalloc_nolock() can be called from any context > the ___slab_alloc() can acquire local_trylock_t (which is rt_spin_lock > in PREEMPT_RT) and attempt to acquire a different local_trylock_t > while in the same task context. > > The calling sequence might look like: > kmalloc() -> tracepoint -> bpf -> kmalloc_nolock() > > or more precisely: > __lock_acquire+0x12ad/0x2590 > lock_acquire+0x133/0x2d0 > rt_spin_lock+0x6f/0x250 > ___slab_alloc+0xb7/0xec0 > kmalloc_nolock_noprof+0x15a/0x430 > my_debug_callback+0x20e/0x390 [testmod] > ___slab_alloc+0x256/0xec0 > __kmalloc_cache_noprof+0xd6/0x3b0 > > Make LOCKDEP understand that local_trylock_t-s protect > different kmem_caches. In order to do that add lock_class_key > for each kmem_cache and use that key in local_trylock_t. > > This stack trace is possible on both PREEMPT_RT and !PREEMPT_RT, > but teach lockdep about it only for PREEMP_RT, since > in !PREEMPT_RT the code is using local_trylock_irqsave() only. > > Signed-off-by: Alexei Starovoitov <ast@xxxxxxxxxx> Should we switch the order of patches 5 and 6 or is it sufficient there are no callers of kmalloc_nolock() yet? > --- > mm/slab.h | 1 + > mm/slub.c | 17 +++++++++++++++++ > 2 files changed, 18 insertions(+) > > diff --git a/mm/slab.h b/mm/slab.h > index 65f4616b41de..165737accb20 100644 > --- a/mm/slab.h > +++ b/mm/slab.h > @@ -262,6 +262,7 @@ struct kmem_cache_order_objects { > struct kmem_cache { > #ifndef CONFIG_SLUB_TINY > struct kmem_cache_cpu __percpu *cpu_slab; > + struct lock_class_key lock_key; I see " * The class key takes no space if lockdep is disabled:", ok good. > #endif > /* Used for retrieving partial slabs, etc. */ > slab_flags_t flags; > diff --git a/mm/slub.c b/mm/slub.c > index c92703d367d7..526296778247 100644 > --- a/mm/slub.c > +++ b/mm/slub.c > @@ -3089,12 +3089,26 @@ static inline void note_cmpxchg_failure(const char *n, > > static void init_kmem_cache_cpus(struct kmem_cache *s) > { > +#ifdef CONFIG_PREEMPT_RT > + /* Register lockdep key for non-boot kmem caches */ > + bool finegrain_lockdep = !init_section_contains(s, 1); I guess it's to avoid the "if (WARN_ON_ONCE(static_obj(key)))" if it means the two bootstrap caches get a different class just by being static, then I guess it works. > +#else > + /* > + * Don't bother with different lockdep classes for each > + * kmem_cache, since we only use local_trylock_irqsave(). > + */ > + bool finegrain_lockdep = false; > +#endif > int cpu; > struct kmem_cache_cpu *c; > > + if (finegrain_lockdep) > + lockdep_register_key(&s->lock_key); > for_each_possible_cpu(cpu) { > c = per_cpu_ptr(s->cpu_slab, cpu); > local_trylock_init(&c->lock); > + if (finegrain_lockdep) > + lockdep_set_class(&c->lock, &s->lock_key); > c->tid = init_tid(cpu); > } > } > @@ -5976,6 +5990,9 @@ void __kmem_cache_release(struct kmem_cache *s) > { > cache_random_seq_destroy(s); > #ifndef CONFIG_SLUB_TINY > +#ifdef CONFIG_PREEMPT_RT > + lockdep_unregister_key(&s->lock_key); > +#endif > free_percpu(s->cpu_slab); > #endif > free_kmem_cache_nodes(s);