Re: [PATCH 1/2] nvme-tcp: avoid race between nvme scan and reset

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 03/06/2025 13:58, Sagi Grimberg wrote:


On 02/06/2025 7:35, Shin'ichiro Kawasaki wrote:
When the nvme scan work and the nvme controller reset work race, the
WARN below happens:

   WARNING: CPU: 1 PID: 69 at block/blk-mq.c:330 blk_mq_unquiesce_queue+0x8f/0xb0

The WARN can be recreated by repeating the blktests test case nvme/063 a
few times [1]. Its cause is the new queue allocation for the tag set
by the scan work between blk_mq_quiesce_tagset() and
blk_mq_unquiesce_tagset() calls by the reset work.

Reset work                     Scan work
------------------------------------------------------------------------
nvme_reset_ctrl_work()
  nvme_tcp_teardown_ctrl()
   nvme_tcp_teardown_io_queues()
    nvme_quiesce_io_queues()
     blk_mq_quiesce_tagset()
      list_for_each_entry()                                 .. N queues
       blk_mq_quiesce_queue()
                                nvme_scan_work()
                                 nvme_scan_ns_*()
                                  nvme_scan_ns()
                                   nvme_alloc_ns()
                                    blk_mq_alloc_disk()
                                     __blk_mq_alloc_disk()
                                      blk_mq_alloc_queue()
blk_mq_init_allocate_queue()
blk_mq_add_queue_tag_set()
                                         list_add_tail()    .. N+1 queues
  nvme_tcp_setup_ctrl()
   nvme_start_ctrl()
    nvme_unquiesce_io_queues()
     blk_mq_unquiesce_tagset()
      list_for_each_entry()                                 .. N+1 queues
       blk_mq_unquiesce_queue()
        WARN_ON_ONCE(q->quiesce_depth <= 0)

blk_mq_quiesce_queue() is not called for the new queue added by the scan
work, while blk_mq_unquiesce_queue() is called for it. Hence the WARN.

To suppress the WARN, avoid the race between the reset work and the scan
work by flushing the scan work at the beginning of the reset work.

Link: https://lore.kernel.org/linux-nvme/6mhxskdlbo6fk6hotsffvwriauurqky33dfb3s44mqtr5dsxmf@gywwmnyh3twm/ [1]
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@xxxxxxx>
---
  drivers/nvme/host/tcp.c | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index f6379aa33d77..14b9d67329b3 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -2491,6 +2491,9 @@ static void nvme_reset_ctrl_work(struct work_struct *work)
          container_of(work, struct nvme_ctrl, reset_work);
      int ret;
  +    /* prevent racing with ns scanning */
+    flush_work(&ctrl->scan_work);

This is a problem. We are flushing a work that is IO bound, which can take a long time to complete. Up until now, we have deliberately avoided introducing dependency between reset forward progress
and scan work IO to completely finish.

I would like to keep it this way.

BTW, this is not TCP specific.

blk_mq_unquiesce_queue is still very much safe to call as many times as we want. The only thing that comes in the way is this pesky WARN. How about we make it go away and have
a debug print instead?

My preference would be to allow nvme to unquiesce queues that were not previously quiesced (just like it historically was) instead of having to block a controller reset until the scan_work is completed (which is admin I/O dependent, and may get stuck until admin timeout, which can be changed by the user for 60
minutes or something arbitrarily long btw).

How about something like this patch instead:
--
diff --git a/block/blk-mq.c b/block/blk-mq.c
index c2697db59109..74f3ad16e812 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -327,8 +327,10 @@ void blk_mq_unquiesce_queue(struct request_queue *q)
        bool run_queue = false;

        spin_lock_irqsave(&q->queue_lock, flags);
-       if (WARN_ON_ONCE(q->quiesce_depth <= 0)) {
-               ;
+       if (q->quiesce_depth <= 0) {
+               printk(KERN_DEBUG
+                       "dev %s: unquiescing a non-quiesced queue, expected?\n",
+                       q->disk ? q->disk->disk_name : "?", );
        } else if (!--q->quiesce_depth) {
                blk_queue_flag_clear(QUEUE_FLAG_QUIESCED, q);
                run_queue = true;
--




[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux