On 6/25/25 10:31 PM, Nilay Shroff wrote:
It seems that some other thread on your system acquired
->freeze_lock and never released it and that prevents
the udev-worker thread to forward progress.
That's wrong. blk_mq_freeze_queue_wait() is waiting for q_usage_counter
to drop to zero as the below output shows:
(gdb) list *(blk_mq_freeze_queue_wait+0xf2)
0xffffffff823ab0b2 is in blk_mq_freeze_queue_wait (block/blk-mq.c:190).
185 }
186 EXPORT_SYMBOL_GPL(blk_freeze_queue_start);
187
188 void blk_mq_freeze_queue_wait(struct request_queue *q)
189 {
190 wait_event(q->mq_freeze_wq,
percpu_ref_is_zero(&q->q_usage_counter));
191 }
192 EXPORT_SYMBOL_GPL(blk_mq_freeze_queue_wait);
193
194 int blk_mq_freeze_queue_wait_timeout(struct request_queue *q,
If you haven't enabled lockdep on your system then can you
please configure lockdep and rerun the srp/002 test?
Lockdep was enabled during the test and didn't complain.
This is my analysis of the deadlock:
* Multiple requests are pending:
# (cd /sys/kernel/debug/block && grep -aH . */*/*/*list) | head
dm-2/hctx0/cpu0/default_rq_list:0000000035c26c20 {.op=READ,
.cmd_flags=SYNC|IDLE, .rq_flags=IO_STAT, .state=idle, .tag=137,
.internal_tag=-1}
dm-2/hctx0/cpu0/default_rq_list:000000005060461e {.op=READ,
.cmd_flags=SYNC|IDLE, .rq_flags=IO_STAT, .state=idle, .tag=136,
.internal_tag=-1}
dm-2/hctx0/cpu0/default_rq_list:000000007cd295ec {.op=READ,
.cmd_flags=SYNC|IDLE, .rq_flags=IO_STAT, .state=idle, .tag=135,
.internal_tag=-1}
dm-2/hctx0/cpu0/default_rq_list:00000000a4a8006b {.op=READ,
.cmd_flags=SYNC|IDLE, .rq_flags=IO_STAT, .state=idle, .tag=134,
.internal_tag=-1}
dm-2/hctx0/cpu0/default_rq_list:000000001f93036f {.op=READ,
.cmd_flags=SYNC|IDLE, .rq_flags=IO_STAT, .state=idle, .tag=140,
.internal_tag=-1}
dm-2/hctx0/cpu0/default_rq_list:00000000333baffb {.op=READ,
.cmd_flags=SYNC|IDLE, .rq_flags=IO_STAT, .state=idle, .tag=173,
.internal_tag=-1}
dm-2/hctx0/cpu0/default_rq_list:000000002c050850 {.op=READ,
.cmd_flags=SYNC|IDLE, .rq_flags=IO_STAT, .state=idle, .tag=141,
.internal_tag=-1}
dm-2/hctx0/cpu0/default_rq_list:000000000668dd8b {.op=WRITE,
.cmd_flags=SYNC|META|PRIO, .rq_flags=IO_STAT, .state=idle, .tag=133,
.internal_tag=-1}
dm-2/hctx0/cpu0/default_rq_list:0000000079b67c9f {.op=READ,
.cmd_flags=SYNC|IDLE, .rq_flags=IO_STAT, .state=idle, .tag=207,
.internal_tag=-1}
dm-2/hctx0/cpu107/default_rq_list:0000000036254afb {.op=READ,
.cmd_flags=SYNC|IDLE, .rq_flags=IO_STAT, .state=idle, .tag=1384,
.internal_tag=-1}
* queue_if_no_path is enabled for the multipath device dm-2:
# ls -l /dev/mapper/mpatha
lrwxrwxrwx 1 root root 7 Jun 26 08:50 /dev/mapper/mpatha -> ../dm-2
# dmsetup table mpatha
0 65536 multipath 1 queue_if_no_path 1 alua 1 1 service-time 0 1 2 8:32 1 1
* The block device 8:32 is being deleted:
# grep '^8:32$' /sys/class/block/*/dev | wc -l
0
* blk_mq_freeze_queue_nomemsave() waits for the pending requests to
finish. Because the only path in the multipath is being deleted
and because queue_if_no_path is enabled,
blk_mq_freeze_queue_nomemsave() hangs.
Bart.