Re: [PATCH v3 6/6] slab: Make slub local_trylock_t more precise for LOCKDEP

Vlastimil Babka <vbabka@xxxxxxx> · Wed, 16 Jul 2025 15:35:13 +0200

On 7/16/25 04:29, Alexei Starovoitov wrote:
> From: Alexei Starovoitov <ast@xxxxxxxxxx>
> 
> Since kmalloc_nolock() can be called from any context
> the ___slab_alloc() can acquire local_trylock_t (which is rt_spin_lock
> in PREEMPT_RT) and attempt to acquire a different local_trylock_t
> while in the same task context.
> 
> The calling sequence might look like:
> kmalloc() -> tracepoint -> bpf -> kmalloc_nolock()
> 
> or more precisely:
> __lock_acquire+0x12ad/0x2590
> lock_acquire+0x133/0x2d0
> rt_spin_lock+0x6f/0x250
> ___slab_alloc+0xb7/0xec0
> kmalloc_nolock_noprof+0x15a/0x430
> my_debug_callback+0x20e/0x390 [testmod]
> ___slab_alloc+0x256/0xec0
> __kmalloc_cache_noprof+0xd6/0x3b0
> 
> Make LOCKDEP understand that local_trylock_t-s protect
> different kmem_caches. In order to do that add lock_class_key
> for each kmem_cache and use that key in local_trylock_t.
> 
> This stack trace is possible on both PREEMPT_RT and !PREEMPT_RT,
> but teach lockdep about it only for PREEMP_RT, since
> in !PREEMPT_RT the code is using local_trylock_irqsave() only.
> 
> Signed-off-by: Alexei Starovoitov <ast@xxxxxxxxxx>

Should we switch the order of patches 5 and 6 or is it sufficient there are
no callers of kmalloc_nolock() yet?

> ---
>  mm/slab.h |  1 +
>  mm/slub.c | 17 +++++++++++++++++
>  2 files changed, 18 insertions(+)
> 
> diff --git a/mm/slab.h b/mm/slab.h
> index 65f4616b41de..165737accb20 100644
> --- a/mm/slab.h
> +++ b/mm/slab.h
> @@ -262,6 +262,7 @@ struct kmem_cache_order_objects {
>  struct kmem_cache {
>  #ifndef CONFIG_SLUB_TINY
>  	struct kmem_cache_cpu __percpu *cpu_slab;
> +	struct lock_class_key lock_key;

I see " * The class key takes no space if lockdep is disabled:", ok good.

>  #endif
>  	/* Used for retrieving partial slabs, etc. */
>  	slab_flags_t flags;
> diff --git a/mm/slub.c b/mm/slub.c
> index c92703d367d7..526296778247 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -3089,12 +3089,26 @@ static inline void note_cmpxchg_failure(const char *n,
>  
>  static void init_kmem_cache_cpus(struct kmem_cache *s)
>  {
> +#ifdef CONFIG_PREEMPT_RT
> +	/* Register lockdep key for non-boot kmem caches */
> +	bool finegrain_lockdep = !init_section_contains(s, 1);

I guess it's to avoid the "if (WARN_ON_ONCE(static_obj(key)))"
if it means the two bootstrap caches get a different class just by being
static, then I guess it works.

> +#else
> +	/*
> +	 * Don't bother with different lockdep classes for each
> +	 * kmem_cache, since we only use local_trylock_irqsave().
> +	 */
> +	bool finegrain_lockdep = false;
> +#endif
>  	int cpu;
>  	struct kmem_cache_cpu *c;
>  
> +	if (finegrain_lockdep)
> +		lockdep_register_key(&s->lock_key);
>  	for_each_possible_cpu(cpu) {
>  		c = per_cpu_ptr(s->cpu_slab, cpu);
>  		local_trylock_init(&c->lock);
> +		if (finegrain_lockdep)
> +			lockdep_set_class(&c->lock, &s->lock_key);
>  		c->tid = init_tid(cpu);
>  	}
>  }
> @@ -5976,6 +5990,9 @@ void __kmem_cache_release(struct kmem_cache *s)
>  {
>  	cache_random_seq_destroy(s);
>  #ifndef CONFIG_SLUB_TINY
> +#ifdef CONFIG_PREEMPT_RT
> +	lockdep_unregister_key(&s->lock_key);
> +#endif
>  	free_percpu(s->cpu_slab);
>  #endif
>  	free_kmem_cache_nodes(s);