On 6/26/25 9:32 PM, Bart Van Assche wrote: > On 6/25/25 10:31 PM, Nilay Shroff wrote: >> It seems that some other thread on your system acquired >> ->freeze_lock and never released it and that prevents >> the udev-worker thread to forward progress. > > That's wrong. blk_mq_freeze_queue_wait() is waiting for q_usage_counter > to drop to zero as the below output shows: > > (gdb) list *(blk_mq_freeze_queue_wait+0xf2) > 0xffffffff823ab0b2 is in blk_mq_freeze_queue_wait (block/blk-mq.c:190). > 185 } > 186 EXPORT_SYMBOL_GPL(blk_freeze_queue_start); > 187 > 188 void blk_mq_freeze_queue_wait(struct request_queue *q) > 189 { > 190 wait_event(q->mq_freeze_wq, percpu_ref_is_zero(&q->q_usage_counter)); > 191 } > 192 EXPORT_SYMBOL_GPL(blk_mq_freeze_queue_wait); > 193 > 194 int blk_mq_freeze_queue_wait_timeout(struct request_queue *q, > >> If you haven't enabled lockdep on your system then can you >> please configure lockdep and rerun the srp/002 test? > > Lockdep was enabled during the test and didn't complain. > > This is my analysis of the deadlock: > > * Multiple requests are pending: > # (cd /sys/kernel/debug/block && grep -aH . */*/*/*list) | head > dm-2/hctx0/cpu0/default_rq_list:0000000035c26c20 {.op=READ, .cmd_flags=SYNC|IDLE, .rq_flags=IO_STAT, .state=idle, .tag=137, .internal_tag=-1} > dm-2/hctx0/cpu0/default_rq_list:000000005060461e {.op=READ, .cmd_flags=SYNC|IDLE, .rq_flags=IO_STAT, .state=idle, .tag=136, .internal_tag=-1} > dm-2/hctx0/cpu0/default_rq_list:000000007cd295ec {.op=READ, .cmd_flags=SYNC|IDLE, .rq_flags=IO_STAT, .state=idle, .tag=135, .internal_tag=-1} > dm-2/hctx0/cpu0/default_rq_list:00000000a4a8006b {.op=READ, .cmd_flags=SYNC|IDLE, .rq_flags=IO_STAT, .state=idle, .tag=134, .internal_tag=-1} > dm-2/hctx0/cpu0/default_rq_list:000000001f93036f {.op=READ, .cmd_flags=SYNC|IDLE, .rq_flags=IO_STAT, .state=idle, .tag=140, .internal_tag=-1} > dm-2/hctx0/cpu0/default_rq_list:00000000333baffb {.op=READ, .cmd_flags=SYNC|IDLE, .rq_flags=IO_STAT, .state=idle, .tag=173, .internal_tag=-1} > dm-2/hctx0/cpu0/default_rq_list:000000002c050850 {.op=READ, .cmd_flags=SYNC|IDLE, .rq_flags=IO_STAT, .state=idle, .tag=141, .internal_tag=-1} > dm-2/hctx0/cpu0/default_rq_list:000000000668dd8b {.op=WRITE, .cmd_flags=SYNC|META|PRIO, .rq_flags=IO_STAT, .state=idle, .tag=133, .internal_tag=-1} > dm-2/hctx0/cpu0/default_rq_list:0000000079b67c9f {.op=READ, .cmd_flags=SYNC|IDLE, .rq_flags=IO_STAT, .state=idle, .tag=207, .internal_tag=-1} > dm-2/hctx0/cpu107/default_rq_list:0000000036254afb {.op=READ, .cmd_flags=SYNC|IDLE, .rq_flags=IO_STAT, .state=idle, .tag=1384, .internal_tag=-1} > > * queue_if_no_path is enabled for the multipath device dm-2: > # ls -l /dev/mapper/mpatha > lrwxrwxrwx 1 root root 7 Jun 26 08:50 /dev/mapper/mpatha -> ../dm-2 > # dmsetup table mpatha > 0 65536 multipath 1 queue_if_no_path 1 alua 1 1 service-time 0 1 2 8:32 1 1 > > * The block device 8:32 is being deleted: > # grep '^8:32$' /sys/class/block/*/dev | wc -l > 0 > > * blk_mq_freeze_queue_nomemsave() waits for the pending requests to > finish. Because the only path in the multipath is being deleted > and because queue_if_no_path is enabled, > blk_mq_freeze_queue_nomemsave() hangs. > Thanks! this makes sense now. But then we do have few other limits (e.g. iostats_passthrough, iostats, write_cache etc.) which are accessed during IO hotpath. So if we were to update those limits then we acquire ->limits_lock and also freezes the queue. So I wonder how could those be addressed? Thanks, --Nilay