On Jun 13, 2025 / 12:56, Ming Lei wrote: > On Wed, Jun 04, 2025 at 11:17:05AM +0000, Shinichiro Kawasaki wrote: > > Cc+: Ming, > > > > Hi Sagi, thanks for the background explanation and the suggestion. > > > > On Jun 04, 2025 / 10:10, Sagi Grimberg wrote: > > ... > > > > This is a problem. We are flushing a work that is IO bound, which can > > > > take a long time to complete. > > > > Up until now, we have deliberately avoided introducing dependency > > > > between reset forward progress > > > > and scan work IO to completely finish. > > > > > > > > I would like to keep it this way. > > > > > > > > BTW, this is not TCP specific. > > > > I see. The blktests test case nvme/063 is dedicated to tcp transport, so that's > > why it was reported for the TCP transport. > > > > > > > > blk_mq_unquiesce_queue is still very much safe to call as many times as we > > > want. > > > The only thing that comes in the way is this pesky WARN. How about we make > > > it go away and have > > > a debug print instead? > > > > > > My preference would be to allow nvme to unquiesce queues that were not > > > previously quiesced (just > > > like it historically was) instead of having to block a controller reset > > > until the scan_work is completed (which > > > is admin I/O dependent, and may get stuck until admin timeout, which can be > > > changed by the user for 60 > > > minutes or something arbitrarily long btw). > > > > > > How about something like this patch instead: > > > -- > > > diff --git a/block/blk-mq.c b/block/blk-mq.c > > > index c2697db59109..74f3ad16e812 100644 > > > --- a/block/blk-mq.c > > > +++ b/block/blk-mq.c > > > @@ -327,8 +327,10 @@ void blk_mq_unquiesce_queue(struct request_queue *q) > > > bool run_queue = false; > > > > > > spin_lock_irqsave(&q->queue_lock, flags); > > > - if (WARN_ON_ONCE(q->quiesce_depth <= 0)) { > > > - ; > > > + if (q->quiesce_depth <= 0) { > > > + printk(KERN_DEBUG > > > + "dev %s: unquiescing a non-quiesced queue, > > > expected?\n", > > > + q->disk ? q->disk->disk_name : "?", ); > > > } else if (!--q->quiesce_depth) { > > > blk_queue_flag_clear(QUEUE_FLAG_QUIESCED, q); > > > run_queue = true; > > > -- > > > > The WARN was introduced with the commit e70feb8b3e68 ("blk-mq: support > > concurrent queue quiesce/unquiesce") that Ming authored. Ming, may I > > ask your comment on the suggestion by Sagi? > > I think it is bad to use one standard block layer API in this unbalanced way, > that is why WARN_ON_ONCE() is added. We shouldn't encourage driver to use > APIs in this way. > > Question is why nvme have to unquiesce one non-quiesced queue? Ming, thanks for the comment. I think the call trace in the commit message of my patch will answer your question. blk_mq_add_queue_tag_set() is called between blk_mq_quiesce_tagset() and blk_mq_unquiesce_tagset() calls. Let me show the call trace again, as below. Anyway, the WARN disappeared with v6.16-rc1 kernel, so this problem does not have high priority at this moment from my point of view. Reset work Scan work ------------------------------------------------------------------------ nvme_reset_ctrl_work() nvme_tcp_teardown_ctrl() nvme_tcp_teardown_io_queues() nvme_quiesce_io_queues() blk_mq_quiesce_tagset() list_for_each_entry() .. N queues blk_mq_quiesce_queue() nvme_scan_work() nvme_scan_ns_*() nvme_scan_ns() nvme_alloc_ns() blk_mq_alloc_disk() __blk_mq_alloc_disk() blk_mq_alloc_queue() blk_mq_init_allocate_queue() blk_mq_add_queue_tag_set() list_add_tail() .. N+1 queues nvme_tcp_setup_ctrl() nvme_start_ctrl() nvme_unquiesce_io_queues() blk_mq_unquiesce_tagset() list_for_each_entry() .. N+1 queues blk_mq_unquiesce_queue() WARN_ON_ONCE(q->quiesce_depth <= 0)