On 5/22/25 20:32, Bart Van Assche wrote: > On 5/22/25 10:38 AM, Jens Axboe wrote: >> On 5/22/25 11:14 AM, Bart Van Assche wrote: >>> static void __submit_bio(struct bio *bio) >>> { >>> /* If plug is not used, add new plug here to cache nsecs time. */ >>> @@ -633,8 +640,12 @@ static void __submit_bio(struct bio *bio) >>> >>> if (!bdev_test_flag(bio->bi_bdev, BD_HAS_SUBMIT_BIO)) { >>> blk_mq_submit_bio(bio); >>> - } else if (likely(bio_queue_enter(bio) == 0)) { >>> + } else { >>> struct gendisk *disk = bio->bi_bdev->bd_disk; >>> + bool zwp = bio_zone_write_plugging(bio); >>> + >>> + if (unlikely(!zwp && bio_queue_enter(bio) != 0)) >>> + goto finish_plug; >>> >>> if ((bio->bi_opf & REQ_POLLED) && >>> !(disk->queue->limits.features & BLK_FEAT_POLL)) { >>> @@ -643,9 +654,12 @@ static void __submit_bio(struct bio *bio) >>> } else { >>> disk->fops->submit_bio(bio); >>> } >>> - blk_queue_exit(disk->queue); >>> + >>> + if (!zwp) >>> + blk_queue_exit(disk->queue); >>> } >> >> This is pretty ugly, and I honestly absolutely hate how there's quite a >> bit of zoned_whatever sprinkling throughout the core code. What's the >> reason for not unplugging here, unaligned writes? Because you should >> presumable have the exact same issues on non-zoned devices if they have >> IO stuck in a plug (and doesn't get unplugged) while someone is waiting >> on a freeze. >> >> A somewhat similar case was solved for IOPOLL and queue entering. That >> would be another thing to look at. Maybe a live enter could work if the >> plug itself pins it? > > Hi Jens, > > q->q_usage_counter is not increased for bios on current->plug_list. > q->q_usage_counter is increased before a bio is added to the zoned > pluglist. So these two cases are different. > > I think it is important to hold a q->q_usage_counter reference for bios > on the zoned plug list because bios are added to that list after bio > splitting happened. Hence, request queue limits must not change while > any bio is on the zoned plug list. > > Damien, do you think it would be possible to add a bio to the zoned plug > list before it is split and not to hold q->q_usage_counter for bios on > the zoned plug list? I do not think this is the right thing to do as that will completely mess up the queue usage counter value with regard to submit_bio() calls. That value is incremented when a BIO is submitted, and that should stay the same for zoned write BIOs that endup in a zone write plug instead of going straigth to the device. The issue comes from the fact that blk_zone_wplug_bio_work() calls submit_bio_noacct_nocheck() which eventually calls __submit_bio() and that function calls blk_queue_enter() for DM devices. We still need to preserve that for the first submission of the BIO but should not/do not need to do the blk_queue_enter() call again when a BIO is submitted via the zone write plug work. I am not sure yet how to best deal with that. I am traveling today and will not have time to cook something. Will take a look over the weekend. -- Damien Le Moal Western Digital Research