Re: [PATCH] locking: detect spin_lock_irq() call with disabled interrupts

Peter Zijlstra <peterz@xxxxxxxxxxxxx> · Fri, 6 Jun 2025 12:58:30 +0200

On Fri, Jun 06, 2025 at 05:57:23PM +0800, Pavel Tikhomirov wrote:
> This is intended to easily detect irq spinlock self-deadlocks like:
> 
>   spin_lock_irq(A);
>   spin_lock_irq(B);
>   spin_unlock_irq(B);
>     IRQ {
>       spin_lock(A); <- deadlocks
>       spin_unlock(A);
>     }
>   spin_unlock_irq(A);
> 
> Recently we saw this kind of deadlock on our partner's node:
> 
> PID: 408      TASK: ffff8eee0870ca00  CPU: 36   COMMAND: "kworker/36:1H"
>  #0 [fffffe3861831e60] crash_nmi_callback at ffffffff97269e31
>  #1 [fffffe3861831e68] nmi_handle at ffffffff972300bb
>  #2 [fffffe3861831eb0] default_do_nmi at ffffffff97e9e000
>  #3 [fffffe3861831ed0] exc_nmi at ffffffff97e9e211
>  #4 [fffffe3861831ef0] end_repeat_nmi at ffffffff98001639
>     [exception RIP: native_queued_spin_lock_slowpath+638]
>     RIP: ffffffff97eb31ae  RSP: ffffb1c8cd2a4d40  RFLAGS: 00000046
>     RAX: 0000000000000000  RBX: ffff8f2dffb34780  RCX: 0000000000940000
>     RDX: 000000000000002a  RSI: 0000000000ac0000  RDI: ffff8eaed4eb81c0
>     RBP: ffff8eaed4eb81c0   R8: 0000000000000000   R9: ffff8f2dffaf3438
>     R10: 0000000000000000  R11: 0000000000000000  R12: 0000000000000000
>     R13: 0000000000000024  R14: 0000000000000000  R15: ffffd1c8bfb24b80
>     ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
> --- <NMI exception stack> ---
>  #5 [ffffb1c8cd2a4d40] native_queued_spin_lock_slowpath at ffffffff97eb31ae
>  #6 [ffffb1c8cd2a4d60] _raw_spin_lock_irqsave at ffffffff97eb2730
>  #7 [ffffb1c8cd2a4d70] __wake_up at ffffffff9737c02d
>  #8 [ffffb1c8cd2a4da0] sbitmap_queue_wake_up at ffffffff9786c74d
>  #9 [ffffb1c8cd2a4dc8] sbitmap_queue_clear at ffffffff9786cc97
> --- <IRQ stack> ---
>     [exception RIP: _raw_spin_unlock_irq+20]
>     RIP: ffffffff97eb2e84  RSP: ffffb1c8cd90fd18  RFLAGS: 00000283
>     RAX: 0000000000000001  RBX: ffff8eafb68efb40  RCX: 0000000000000001
>     RDX: 0000000000000008  RSI: 0000000000000061  RDI: ffff8eafb06c3c70
>     RBP: ffff8eee7af43000   R8: ffff8eaed4eb81c8   R9: ffff8eaed4eb81c8
>     R10: 0000000000000008  R11: 0000000000000008  R12: 0000000000000000
>     R13: ffff8eafb06c3bd0  R14: ffff8eafb06c3bc0  R15: ffff8eaed4eb81c0
>     ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
> 
> Luckily it was already fixed in mainstream by:
> commit b313a8c83551 ("block: Fix lockdep warning in blk_mq_mark_tag_wait")
> 
> Currently if we are unlucky we may miss such a deadlock on our testing
> system as it is racy and it depends on the specific interrupt handler
> appearing at the right place and at the right time. So this patch tries
> to detect the problem despite the absence of the interrupt.
> 
> If we see spin_lock_irq under interrupts already disabled we can assume
> that it has paired spin_unlock_irq which would reenable interrupts where
> they should not be reenabled. So we report a warning for it.
> 
> Same thing on spin_unlock_irq even if we were lucky and there was no
> deadlock let's report if interrupts were enabled.
> 
> Let's make this functionality catch one problem and then be disabled, to
> prevent from spamming kernel log with warnings. Also let's add sysctl
> kernel.debug_spin_lock_irq_with_disabled_interrupts to reenable it if
> needed. Also let's add a by default enabled configuration option
> DEBUG_SPINLOCK_IRQ_WITH_DISABLED_INTERRUPTS_BY_DEFAULT, in case we will
> need this on boot.
> 
> Yes Lockdep can detect that, if it sees both the interrupt stack and the
> regular stack where we can get into interrupt with spinlock held. But
> with this approach we can detect the problem even without ever getting
> into interrupt stack. And also this functionality seems to be more
> lightweight then Lockdep as it does not need to maintain lock dependency
> graph.

So why do we need DEBUG_SPINLOCK code, that's injected into every single
callsite, if lockdep can already detect this?