Re: [PATCH v3 6/6] slab: Make slub local_trylock_t more precise for LOCKDEP

Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> · Wed, 16 Jul 2025 20:32:06 -0700

On Wed, Jul 16, 2025 at 6:35 AM Vlastimil Babka <vbabka@xxxxxxx> wrote:
>
> On 7/16/25 04:29, Alexei Starovoitov wrote:
> > From: Alexei Starovoitov <ast@xxxxxxxxxx>
> >
> > Since kmalloc_nolock() can be called from any context
> > the ___slab_alloc() can acquire local_trylock_t (which is rt_spin_lock
> > in PREEMPT_RT) and attempt to acquire a different local_trylock_t
> > while in the same task context.
> >
> > The calling sequence might look like:
> > kmalloc() -> tracepoint -> bpf -> kmalloc_nolock()
> >
> > or more precisely:
> > __lock_acquire+0x12ad/0x2590
> > lock_acquire+0x133/0x2d0
> > rt_spin_lock+0x6f/0x250
> > ___slab_alloc+0xb7/0xec0
> > kmalloc_nolock_noprof+0x15a/0x430
> > my_debug_callback+0x20e/0x390 [testmod]
> > ___slab_alloc+0x256/0xec0
> > __kmalloc_cache_noprof+0xd6/0x3b0
> >
> > Make LOCKDEP understand that local_trylock_t-s protect
> > different kmem_caches. In order to do that add lock_class_key
> > for each kmem_cache and use that key in local_trylock_t.
> >
> > This stack trace is possible on both PREEMPT_RT and !PREEMPT_RT,
> > but teach lockdep about it only for PREEMP_RT, since
> > in !PREEMPT_RT the code is using local_trylock_irqsave() only.
> >
> > Signed-off-by: Alexei Starovoitov <ast@xxxxxxxxxx>
>
> Should we switch the order of patches 5 and 6 or is it sufficient there are
> no callers of kmalloc_nolock() yet?

I can switch the order. I don't think it makes much difference.

> > ---
> >  mm/slab.h |  1 +
> >  mm/slub.c | 17 +++++++++++++++++
> >  2 files changed, 18 insertions(+)
> >
> > diff --git a/mm/slab.h b/mm/slab.h
> > index 65f4616b41de..165737accb20 100644
> > --- a/mm/slab.h
> > +++ b/mm/slab.h
> > @@ -262,6 +262,7 @@ struct kmem_cache_order_objects {
> >  struct kmem_cache {
> >  #ifndef CONFIG_SLUB_TINY
> >       struct kmem_cache_cpu __percpu *cpu_slab;
> > +     struct lock_class_key lock_key;
>
> I see " * The class key takes no space if lockdep is disabled:", ok good.

yes.
I was considering guarding it with
#ifdef CONFIG_PREEMPT_RT
but then all accesses to ->lock_key would need to be wrapped
with #ifdef as well. So init_kmem_cache_cpus() will have
three #ifdef-s instead of one.
Hence the above lock_key; is unconditional,
though it's a bit odd that it's completely unused on !RT
(only the compiler touches it).

> >  #endif
> >       /* Used for retrieving partial slabs, etc. */
> >       slab_flags_t flags;
> > diff --git a/mm/slub.c b/mm/slub.c
> > index c92703d367d7..526296778247 100644
> > --- a/mm/slub.c
> > +++ b/mm/slub.c
> > @@ -3089,12 +3089,26 @@ static inline void note_cmpxchg_failure(const char *n,
> >
> >  static void init_kmem_cache_cpus(struct kmem_cache *s)
> >  {
> > +#ifdef CONFIG_PREEMPT_RT
> > +     /* Register lockdep key for non-boot kmem caches */
> > +     bool finegrain_lockdep = !init_section_contains(s, 1);
>
> I guess it's to avoid the "if (WARN_ON_ONCE(static_obj(key)))"
> if it means the two bootstrap caches get a different class just by being
> static, then I guess it works.

Yes. Not pretty. The alternative is to pass a flag
through a bunch of functions all the way from kmem_cache_init.
Another alternative is to
bool finegrain_lockdep = s != boot_kmem_cache_node && s != boot_kmem_cache.
Both are imo uglier.
init_section_contains() is more precise and matches static_obj().

Checking for SLAB_NO_OBJ_EXT isn't an option. Since it's conditional.