On Tue, May 13, 2025 at 08:58:43AM +0200, Vlastimil Babka wrote: > On 5/12/25 19:16, Alexei Starovoitov wrote: > > On Mon, May 12, 2025 at 7:04 AM Sebastian Andrzej Siewior > > <bigeasy@xxxxxxxxxxxxx> wrote: > >> > >> On 2025-04-30 20:27:16 [-0700], Alexei Starovoitov wrote: > >> > --- a/include/linux/local_lock_internal.h > >> > +++ b/include/linux/local_lock_internal.h > >> > @@ -168,6 +168,15 @@ do { \ > >> > /* preemption or migration must be disabled before calling __local_lock_is_locked */ > >> > #define __local_lock_is_locked(lock) READ_ONCE(this_cpu_ptr(lock)->acquired) > >> > > >> > +#define __local_lock_irqsave_check(lock, flags) \ > >> > + do { \ > >> > + if (IS_ENABLED(CONFIG_DEBUG_LOCK_ALLOC) && \ > >> > + (!__local_lock_is_locked(lock) || in_nmi())) \ > >> > + WARN_ON_ONCE(!__local_trylock_irqsave(lock, flags)); \ > >> > + else \ > >> > + __local_lock_irqsave(lock, flags); \ > >> > + } while (0) > >> > + > >> > >> Hmm. If I see this right in SLUB then this is called from preemptible > >> context. Therefore the this_cpu_ptr() from __local_lock_is_locked() > >> should trigger a warning here. > > > > When preemptible the migration is disabled. So no warning. > > > >> This check variant provides only additional debugging and otherwise > >> behaves as local_lock_irqsave(). Therefore the in_nmi() should return > >> immediately with a WARN_ON() regardless if the lock is available or not > >> because the non-try variant should never be invoked from an NMI. > > > > non-try variant can be invoked from NMI, because the earlier > > __local_lock_is_locked() check tells us that the lock is not locked. > > And it's safe to do. > > And that's the main challenge here. > > local_lock_irqsave_check() macro fights lockdep here. > > > >> This looks like additional debug infrastructure that should be part of > >> local_lock_irqsave() itself, > > > > The pattern of > > > > if (!__local_lock_is_locked(lock)) { > > .. lots of code.. > > local_lock_irqsave(lock); > > > > is foreign to lockdep. > > > > Since it can be called from NMI the lockdep just hates it: > > > > [ 1021.956825] inconsistent {INITIAL USE} -> {IN-NMI} usage. > > ... > > [ 1021.956888] lock(per_cpu_ptr(&lock)); > > [ 1021.956890] <Interrupt> > > [ 1021.956891] lock(per_cpu_ptr(&lock)); > > .. > > > > and technically lockdep is correct. > > For any normal lock it's a deadlock waiting to happen, > > but not here. > > > > Even without NMI the lockdep doesn't like it: > > [ 14.627331] page_alloc_kthr/1965 is trying to acquire lock: > > [ 14.627331] ffff8881f6ebe0f0 ((local_lock_t > > *)&c->lock){-.-.}-{3:3}, at: ___slab_alloc+0x9a9/0x1ab0 > > [ 14.627331] > > [ 14.627331] but task is already holding lock: > > [ 14.627331] ffff8881f6ebd490 ((local_lock_t > > *)&c->lock){-.-.}-{3:3}, at: ___slab_alloc+0xc7/0x1ab0 > > [ 14.627331] > > [ 14.627331] other info that might help us debug this: > > [ 14.627331] Possible unsafe locking scenario: > > [ 14.627331] > > [ 14.627331] CPU0 > > [ 14.627331] ---- > > [ 14.627331] lock((local_lock_t *)&c->lock); > > [ 14.627331] lock((local_lock_t *)&c->lock); > > > > When slub is holding lock ...bd490 we detect it with > > __local_lock_is_locked(), > > then we check that lock ..be0f0 is not locked, > > and proceed to acquire it, but > > lockdep will show the above splat. > > > > So local_lock_irqsave_check() is a workaround to avoid > > these two false positives from lockdep. > > > > Yours and Vlastimil's observation is correct, that ideally > > local_lock_irqsave() should just handle it, > > but I don't see how to do it. > > How can lockdep understand the if (!locked()) lock() pattern ? > > Such usage is correct only for per-cpu local lock when migration > > is disabled from check to acquire. > > Thanks, I think I finally understand the issue and why a _check variant is > necessary. As a general note as this is so tricky, having more details in > comments and commit messages can't hurt so we can understand it sooner :) > > Again this would be all simpler if we could just use trylock instead of > _check(), but then we need to handle the fallbacks. And AFAIU on RT trylock > can fail "spuriously", i.e. when we don't really preempt ourselves, as we > discussed in that memcg thread. > > > Hence the macro is doing: > > if (IS_ENABLED(CONFIG_DEBUG_LOCK_ALLOC) && > > (!__local_lock_is_locked(lock) || in_nmi())) > > WARN_ON_ONCE(!__local_trylock_irqsave(lock, flags)); > > > > in_nmi() part is a workaround for the first lockdep splat > > and __local_lock_is_locked() is a workaround for 2nd lockdep splat, > > though the code did __local_lock_is_locked() check already. > > So here's where this would be useful to have that info in a comment. > However, I wonder about it, as the code uses __local_trylock_irqsave(), so > lockdep should see it as an opportunistic attempt and not splat as that > trylock alone should be avoiding deadlock - if not we might have a bug in > the lockdep bits of trylock. Point taken. The comments need to be more detailed. I've been thinking of a way to avoid local_lock_irqsave_check() and came up with the following: diff --git a/include/linux/local_lock_internal.h b/include/linux/local_lock_internal.h index 94be15d574ad..58ac29f4ba9b 100644 --- a/include/linux/local_lock_internal.h +++ b/include/linux/local_lock_internal.h @@ -79,7 +79,7 @@ do { \ \ debug_check_no_locks_freed((void *)lock, sizeof(*lock));\ lockdep_init_map_type(&(lock)->dep_map, #lock, &__key, \ - 0, LD_WAIT_CONFIG, LD_WAIT_INV, \ + 1, LD_WAIT_CONFIG, LD_WAIT_INV, \ LD_LOCK_PERCPU); \ local_lock_debug_init(lock); \ } while (0) @@ -166,11 +166,21 @@ do { \ }) /* preemption or migration must be disabled before calling __local_lock_is_locked */ -#define __local_lock_is_locked(lock) READ_ONCE(this_cpu_ptr(lock)->acquired) +#define __local_lock_is_locked(lock) \ + ({ \ + bool ret = READ_ONCE(this_cpu_ptr(lock)->acquired); \ + \ + if (!ret) \ + this_cpu_ptr(lock)->dep_map.flags = LOCAL_LOCK_UNLOCKED;\ + ret; \ + }) + +#define __local_lock_flags_clear(lock) \ + do { this_cpu_ptr(lock)->dep_map.flags = 0; } while (0) It would need to be wrapped into macroses for !LOCKDEP, of course. diff --git a/include/linux/lockdep_types.h b/include/linux/lockdep_types.h index 9f361d3ab9d9..6c580081ace3 100644 --- a/include/linux/lockdep_types.h +++ b/include/linux/lockdep_types.h @@ -190,13 +190,15 @@ struct lockdep_map { u8 wait_type_outer; /* can be taken in this context */ u8 wait_type_inner; /* presents this context */ u8 lock_type; - /* u8 hole; */ + u8 flags; #ifdef CONFIG_LOCK_STAT int cpu; unsigned long ip; #endif }; +#define LOCAL_LOCK_UNLOCKED 1 diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c index 58d78a33ac65..0eadee339e1f 100644 --- a/kernel/locking/lockdep.c +++ b/kernel/locking/lockdep.c @@ -4961,6 +4961,7 @@ void lockdep_init_map_type(struct lockdep_map *lock, const char *name, lock->wait_type_outer = outer; lock->wait_type_inner = inner; lock->lock_type = lock_type; + lock->flags = 0; /* * No key, no joy, we need to hash something. @@ -5101,6 +5102,9 @@ static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass, lockevent_inc(lockdep_nocheck); } + if (unlikely(lock->flags == LOCAL_LOCK_UNLOCKED)) + subclass++; + if (subclass < NR_LOCKDEP_CACHING_CLASSES) class = lock->class_cache[subclass]; /* and the usage from slub/memcg looks like this: if (!!local_lock_is_locked(&s->cpu_slab->lock)) { ret = __slab_alloc_node(s, alloc_gfp, node, _RET_IP_, size); __local_lock_flags_clear(&s->cpu_slab->lock); } With that all normal local_lock_irqsave() automagically work. High level the idea is to tell lockdep: "trust me, I know what I'm doing". Since it's a per-cpu local lock the workaround tells lockdep to treat such local_lock as nested, so lockdep allows second local_lock while the same cpu (in !RT) or task (in RT) is holding another local_lock. It addresses the 2nd false positive above: [ 14.627331] lock((local_lock_t *)&c->lock); [ 14.627331] lock((local_lock_t *)&c->lock); but doesn't address the first false positive of: [ 1021.956825] inconsistent {INITIAL USE} -> {IN-NMI} usage. We can silence lockdep for this lock with: @@ -5839,6 +5840,9 @@ void lock_acquire(struct lockdep_map *lock, unsigned int subclass, */ kasan_check_byte(lock); + if (unlikely(lock->flags == LOCAL_LOCK_UNLOCKED)) + trylock = 1; + if (unlikely(!lockdep_enabled())) { Then all lockdep false positives are gone. In other words the pair: local_lock_is_locked(&local_lock); __local_lock_flags_clear(&local_lock); guards the region where local_lock can be taken multiple times on that cpu/task from any context including nmi. We know that the task won't migrate, so multiple lock/unlock of unlocked lock is safe. I think this is a lesser evil hack/workaround than local_lock_irqsave_check(). It gives clean start/end scope for such usage of local_lock. Thoughts?