Re: [PATCH] block: Fix a deadlock related to modifying the readahead attribute

Nilay Shroff <nilay@xxxxxxxxxxxxx> · Fri, 27 Jun 2025 11:46:11 +0530



On 6/26/25 9:32 PM, Bart Van Assche wrote:
> On 6/25/25 10:31 PM, Nilay Shroff wrote:
>> It seems that some other thread on your system acquired
>> ->freeze_lock and never released it and that prevents
>> the udev-worker thread to forward progress.
> 
> That's wrong. blk_mq_freeze_queue_wait() is waiting for q_usage_counter
> to drop to zero as the below output shows:
> 
> (gdb) list *(blk_mq_freeze_queue_wait+0xf2)
> 0xffffffff823ab0b2 is in blk_mq_freeze_queue_wait (block/blk-mq.c:190).
> 185     }
> 186     EXPORT_SYMBOL_GPL(blk_freeze_queue_start);
> 187
> 188     void blk_mq_freeze_queue_wait(struct request_queue *q)
> 189     {
> 190             wait_event(q->mq_freeze_wq, percpu_ref_is_zero(&q->q_usage_counter));
> 191     }
> 192     EXPORT_SYMBOL_GPL(blk_mq_freeze_queue_wait);
> 193
> 194     int blk_mq_freeze_queue_wait_timeout(struct request_queue *q,
> 
>> If you haven't enabled lockdep on your system then can you
>> please configure lockdep and rerun the srp/002 test?
> 
> Lockdep was enabled during the test and didn't complain.
> 
> This is my analysis of the deadlock:
> 
> * Multiple requests are pending:
> # (cd /sys/kernel/debug/block && grep -aH . */*/*/*list) | head
> dm-2/hctx0/cpu0/default_rq_list:0000000035c26c20 {.op=READ, .cmd_flags=SYNC|IDLE, .rq_flags=IO_STAT, .state=idle, .tag=137, .internal_tag=-1}
> dm-2/hctx0/cpu0/default_rq_list:000000005060461e {.op=READ, .cmd_flags=SYNC|IDLE, .rq_flags=IO_STAT, .state=idle, .tag=136, .internal_tag=-1}
> dm-2/hctx0/cpu0/default_rq_list:000000007cd295ec {.op=READ, .cmd_flags=SYNC|IDLE, .rq_flags=IO_STAT, .state=idle, .tag=135, .internal_tag=-1}
> dm-2/hctx0/cpu0/default_rq_list:00000000a4a8006b {.op=READ, .cmd_flags=SYNC|IDLE, .rq_flags=IO_STAT, .state=idle, .tag=134, .internal_tag=-1}
> dm-2/hctx0/cpu0/default_rq_list:000000001f93036f {.op=READ, .cmd_flags=SYNC|IDLE, .rq_flags=IO_STAT, .state=idle, .tag=140, .internal_tag=-1}
> dm-2/hctx0/cpu0/default_rq_list:00000000333baffb {.op=READ, .cmd_flags=SYNC|IDLE, .rq_flags=IO_STAT, .state=idle, .tag=173, .internal_tag=-1}
> dm-2/hctx0/cpu0/default_rq_list:000000002c050850 {.op=READ, .cmd_flags=SYNC|IDLE, .rq_flags=IO_STAT, .state=idle, .tag=141, .internal_tag=-1}
> dm-2/hctx0/cpu0/default_rq_list:000000000668dd8b {.op=WRITE, .cmd_flags=SYNC|META|PRIO, .rq_flags=IO_STAT, .state=idle, .tag=133, .internal_tag=-1}
> dm-2/hctx0/cpu0/default_rq_list:0000000079b67c9f {.op=READ, .cmd_flags=SYNC|IDLE, .rq_flags=IO_STAT, .state=idle, .tag=207, .internal_tag=-1}
> dm-2/hctx0/cpu107/default_rq_list:0000000036254afb {.op=READ, .cmd_flags=SYNC|IDLE, .rq_flags=IO_STAT, .state=idle, .tag=1384, .internal_tag=-1}
> 
> * queue_if_no_path is enabled for the multipath device dm-2:
> # ls -l /dev/mapper/mpatha
> lrwxrwxrwx 1 root root 7 Jun 26 08:50 /dev/mapper/mpatha -> ../dm-2
> # dmsetup table mpatha
> 0 65536 multipath 1 queue_if_no_path 1 alua 1 1 service-time 0 1 2 8:32 1 1
> 
> * The block device 8:32 is being deleted:
> # grep '^8:32$' /sys/class/block/*/dev | wc -l
> 0
> 
> * blk_mq_freeze_queue_nomemsave() waits for the pending requests to
>   finish. Because the only path in the multipath is being deleted
>   and because queue_if_no_path is enabled,
>   blk_mq_freeze_queue_nomemsave() hangs.
> 
Thanks! this makes sense now. But then we do have few other limits 
(e.g. iostats_passthrough, iostats, write_cache etc.) which are accessed
during IO hotpath. So if we were to update those limits then we acquire
->limits_lock and also freezes the queue. So I wonder how could those be
addressed? 

Thanks,
--Nilay