On Wed, Jul 16, 2025 at 6:35 AM Vlastimil Babka <vbabka@xxxxxxx> wrote: > > On 7/16/25 04:29, Alexei Starovoitov wrote: > > From: Alexei Starovoitov <ast@xxxxxxxxxx> > > > > Since kmalloc_nolock() can be called from any context > > the ___slab_alloc() can acquire local_trylock_t (which is rt_spin_lock > > in PREEMPT_RT) and attempt to acquire a different local_trylock_t > > while in the same task context. > > > > The calling sequence might look like: > > kmalloc() -> tracepoint -> bpf -> kmalloc_nolock() > > > > or more precisely: > > __lock_acquire+0x12ad/0x2590 > > lock_acquire+0x133/0x2d0 > > rt_spin_lock+0x6f/0x250 > > ___slab_alloc+0xb7/0xec0 > > kmalloc_nolock_noprof+0x15a/0x430 > > my_debug_callback+0x20e/0x390 [testmod] > > ___slab_alloc+0x256/0xec0 > > __kmalloc_cache_noprof+0xd6/0x3b0 > > > > Make LOCKDEP understand that local_trylock_t-s protect > > different kmem_caches. In order to do that add lock_class_key > > for each kmem_cache and use that key in local_trylock_t. > > > > This stack trace is possible on both PREEMPT_RT and !PREEMPT_RT, > > but teach lockdep about it only for PREEMP_RT, since > > in !PREEMPT_RT the code is using local_trylock_irqsave() only. > > > > Signed-off-by: Alexei Starovoitov <ast@xxxxxxxxxx> > > Should we switch the order of patches 5 and 6 or is it sufficient there are > no callers of kmalloc_nolock() yet? I can switch the order. I don't think it makes much difference. > > --- > > mm/slab.h | 1 + > > mm/slub.c | 17 +++++++++++++++++ > > 2 files changed, 18 insertions(+) > > > > diff --git a/mm/slab.h b/mm/slab.h > > index 65f4616b41de..165737accb20 100644 > > --- a/mm/slab.h > > +++ b/mm/slab.h > > @@ -262,6 +262,7 @@ struct kmem_cache_order_objects { > > struct kmem_cache { > > #ifndef CONFIG_SLUB_TINY > > struct kmem_cache_cpu __percpu *cpu_slab; > > + struct lock_class_key lock_key; > > I see " * The class key takes no space if lockdep is disabled:", ok good. yes. I was considering guarding it with #ifdef CONFIG_PREEMPT_RT but then all accesses to ->lock_key would need to be wrapped with #ifdef as well. So init_kmem_cache_cpus() will have three #ifdef-s instead of one. Hence the above lock_key; is unconditional, though it's a bit odd that it's completely unused on !RT (only the compiler touches it). > > #endif > > /* Used for retrieving partial slabs, etc. */ > > slab_flags_t flags; > > diff --git a/mm/slub.c b/mm/slub.c > > index c92703d367d7..526296778247 100644 > > --- a/mm/slub.c > > +++ b/mm/slub.c > > @@ -3089,12 +3089,26 @@ static inline void note_cmpxchg_failure(const char *n, > > > > static void init_kmem_cache_cpus(struct kmem_cache *s) > > { > > +#ifdef CONFIG_PREEMPT_RT > > + /* Register lockdep key for non-boot kmem caches */ > > + bool finegrain_lockdep = !init_section_contains(s, 1); > > I guess it's to avoid the "if (WARN_ON_ONCE(static_obj(key)))" > if it means the two bootstrap caches get a different class just by being > static, then I guess it works. Yes. Not pretty. The alternative is to pass a flag through a bunch of functions all the way from kmem_cache_init. Another alternative is to bool finegrain_lockdep = s != boot_kmem_cache_node && s != boot_kmem_cache. Both are imo uglier. init_section_contains() is more precise and matches static_obj(). Checking for SLAB_NO_OBJ_EXT isn't an option. Since it's conditional.