On 5/23/25 2:10 AM, Damien Le Moal wrote: > On 5/22/25 19:38, Jens Axboe wrote: >> On 5/22/25 11:14 AM, Bart Van Assche wrote: >>> blk_mq_freeze_queue() never terminates if one or more bios are on the plug >>> list and if the block device driver defines a .submit_bio() method. >>> This is the case for device mapper drivers. The deadlock happens because >>> blk_mq_freeze_queue() waits for q_usage_counter to drop to zero, because >>> a queue reference is held by bios on the plug list and because the >>> __bio_queue_enter() call in __submit_bio() waits for the queue to be >>> unfrozen. >>> >>> This patch fixes the following deadlock: >>> >>> Workqueue: dm-51_zwplugs blk_zone_wplug_bio_work >>> Call trace: >>> __schedule+0xb08/0x1160 >>> schedule+0x48/0xc8 >>> __bio_queue_enter+0xcc/0x1d0 >>> __submit_bio+0x100/0x1b0 >>> submit_bio_noacct_nocheck+0x230/0x49c >>> blk_zone_wplug_bio_work+0x168/0x250 >>> process_one_work+0x26c/0x65c >>> worker_thread+0x33c/0x498 >>> kthread+0x110/0x134 >>> ret_from_fork+0x10/0x20 >>> >>> Call trace: >>> __switch_to+0x230/0x410 >>> __schedule+0xb08/0x1160 >>> schedule+0x48/0xc8 >>> blk_mq_freeze_queue_wait+0x78/0xb8 >>> blk_mq_freeze_queue+0x90/0xa4 >>> queue_attr_store+0x7c/0xf0 >>> sysfs_kf_write+0x98/0xc8 >>> kernfs_fop_write_iter+0x12c/0x1d4 >>> vfs_write+0x340/0x3ac >>> ksys_write+0x78/0xe8 >>> >>> Cc: Christoph Hellwig <hch@xxxxxx> >>> Cc: Damien Le Moal <dlemoal@xxxxxxxxxx> >>> Cc: Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> >>> Cc: Ming Lei <ming.lei@xxxxxxxxxx> >>> Cc: stable@xxxxxxxxxxxxxxx >>> Fixes: dd291d77cc90 ("block: Introduce zone write plugging") >>> Signed-off-by: Bart Van Assche <bvanassche@xxxxxxx> >>> --- >>> >>> Changes compared to v1: fixed a race condition. Call bio_zone_write_plugging() >>> only before submitting the bio and not after it has been submitted. >>> >>> block/blk-core.c | 18 ++++++++++++++++-- >>> 1 file changed, 16 insertions(+), 2 deletions(-) >>> >>> diff --git a/block/blk-core.c b/block/blk-core.c >>> index b862c66018f2..713fb3865260 100644 >>> --- a/block/blk-core.c >>> +++ b/block/blk-core.c >>> @@ -621,6 +621,13 @@ static inline blk_status_t blk_check_zone_append(struct request_queue *q, >>> return BLK_STS_OK; >>> } >>> >>> +/* >>> + * Do not call bio_queue_enter() if the BIO_ZONE_WRITE_PLUGGING flag has been >>> + * set because this causes blk_mq_freeze_queue() to deadlock if >>> + * blk_zone_wplug_bio_work() submits a bio. Calling bio_queue_enter() for bios >>> + * on the plug list is not necessary since a q_usage_counter reference is held >>> + * while a bio is on the plug list. >>> + */ >>> static void __submit_bio(struct bio *bio) >>> { >>> /* If plug is not used, add new plug here to cache nsecs time. */ >>> @@ -633,8 +640,12 @@ static void __submit_bio(struct bio *bio) >>> >>> if (!bdev_test_flag(bio->bi_bdev, BD_HAS_SUBMIT_BIO)) { >>> blk_mq_submit_bio(bio); >>> - } else if (likely(bio_queue_enter(bio) == 0)) { >>> + } else { >>> struct gendisk *disk = bio->bi_bdev->bd_disk; >>> + bool zwp = bio_zone_write_plugging(bio); >>> + >>> + if (unlikely(!zwp && bio_queue_enter(bio) != 0)) >>> + goto finish_plug; >>> >>> if ((bio->bi_opf & REQ_POLLED) && >>> !(disk->queue->limits.features & BLK_FEAT_POLL)) { >>> @@ -643,9 +654,12 @@ static void __submit_bio(struct bio *bio) >>> } else { >>> disk->fops->submit_bio(bio); >>> } >>> - blk_queue_exit(disk->queue); >>> + >>> + if (!zwp) >>> + blk_queue_exit(disk->queue); >>> } >> >> This is pretty ugly, and I honestly absolutely hate how there's quite a >> bit of zoned_whatever sprinkling throughout the core code. What's the >> reason for not unplugging here, unaligned writes? Because you should >> presumable have the exact same issues on non-zoned devices if they have >> IO stuck in a plug (and doesn't get unplugged) while someone is waiting >> on a freeze. >> >> A somewhat similar case was solved for IOPOLL and queue entering. That >> would be another thing to look at. Maybe a live enter could work if the >> plug itself pins it? > > What about this patch, completely untested... It's exactly the same thing, more zoned sprinkling in code that should not need to care. Only difference is that it moved to a function. -- Jens Axboe