Re: [bug report] blktests nvme/tcp nvme/060 hang

"Maurizio Lombardi" <mlombard@xxxxxxxxxxxxxxx> · Wed, 06 Aug 2025 11:30:27 +0200

On Wed Aug 6, 2025 at 8:44 AM CEST, Maurizio Lombardi wrote:
> On Wed Aug 6, 2025 at 8:22 AM CEST, Maurizio Lombardi wrote:
>>
>> Ops sorry they are two read locks, the real problem then is that
>> something is holding the write lock.
>
> Ok, I think I get what happens now.
>
> The threads that call nvmet_tcp_data_ready() (takes the read lock 2
> times) and
> nvmet_tcp_release_queue_work() (tries to take the write lock)
> are blocking each other.
> So I still think that deferring the call to queue->data_ready() by
> using a workqueue should fix it.
>

I reproduced the issue by creating a reader thread that tries to take
the lock twice and a writer thread that takes the write lock
between the two calls to read_lock()

[   33.398311] [Reader] Thread started.
[   33.398410] [Writer] Thread started, waiting for reader to get lock...
[   33.398577] [Reader] Acquired read_lock successfully.
[   33.399391] [Reader] Sleeping for a while to allow writer to block...
[   33.418697] [Writer] Reader has the lock. Attempting to acquire write_lock... THIS SHOULD BLOCK.
[   41.288105] [Reader] Attempting to acquire a second read_lock... THIS SHOULD BLOCK.
[   93.388349] rcu: INFO: rcu_preempt self-detected stall on CPU
[   93.388758] rcu:     7-....: (5999 ticks this GP) idle=9db4/1/0x4000000000000000 softirq=1846/1846 fqs=2444
[   93.389390] rcu:     (t=6001 jiffies g=1917 q=4319 ncpus=8)
[   93.389745] CPU: 7 UID: 0 PID: 1784 Comm: reader_thread Kdump: loaded Tainted: G           OEL    -------  ---  6.12.0-116.el10.aarch64 #1 PREEMPT(voluntary)
[   93.389749] Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE, [L]=SOFTLOCKUP
[   93.389749] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
[   93.389750] pstate: 00400005 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[   93.389752] pc : queued_spin_lock_slowpath+0x78/0x460
[   93.389757] lr : queued_read_lock_slowpath+0x21c/0x228
[   93.389759] sp : ffff80008bd6bdd0
[   93.389760] x29: ffff80008bd6bdd0 x28: 0000000000000000 x27: 0000000000000000
[   93.389762] x26: 0000000000000000 x25: 0000000000000000 x24: 0000000000000000
[   93.389764] x23: ffffb1c374605008 x22: ffff0000ca9342c0 x21: ffff80008bafb960
[   93.389766] x20: ffff0000c4735e40 x19: ffffb1c37460701c x18: 0000000000000006
[   93.389767] x17: 444c554f48532053 x16: ffffb1c3ee73ab48 x15: 636f6c5f64616572
[   93.389769] x14: 20646e6f63657320 x13: 2e4b434f4c422044 x12: ffffb1c3eff5ec10
[   93.389771] x11: ffffb1c3efc9ec68 x10: ffffb1c3eff5ec68 x9 : ffffb1c3ee73b4c4
[   93.389772] x8 : 0000000000000001 x7 : 00000000000bffe8 x6 : c0000000ffff7fff
[   93.389774] x5 : ffff00112ebe05c8 x4 : 0000000000000000 x3 : 0000000000000000
[   93.389776] x2 : 0000000000000001 x1 : 0000000000000000 x0 : 0000000000000001
[   93.389778] Call trace:
[   93.389779]  queued_spin_lock_slowpath+0x78/0x460 (P)
[   93.389782]  queued_read_lock_slowpath+0x21c/0x228
[   93.389785]  _raw_read_lock+0x60/0x80
[   93.389787]  reader_thread_fn+0x7c/0xc0 [dead]
[   93.389791]  kthread+0x110/0x130
[   93.389794]  ret_from_fork+0x10/0x20

So apparently in case of contention writers have the precedence.

Note that the same problem may also affect nvmet_tcp_write_space()

Maurizio