On 7/18/25 05:57, Bart Van Assche wrote: > Hi Jens, > > This patch series improves small write IOPS by a factor of two for zoned UFS > devices on my test setup. The changes included in this patch series are as > follows: > - A new request queue limits flag is introduced that allows block drivers to > declare whether or not the request order is preserved per hardware queue. > - The order of zoned writes is preserved in the block layer by submitting all > zoned writes from the same CPU core as long as any zoned writes are pending. > - A new member 'from_cpu' is introduced in the per-zone data structure > 'blk_zone_wplug' to track from which CPU to submit zoned writes. This data > member is reset to -1 after all pending zoned writes for a zone have > completed. > - The retry count for zoned writes is increased in the SCSI core to deal with > reordering caused by unit attention conditions or the SCSI error handler. > - New functionality is added in the scsi_debug driver to make it easier to > test the changes introduced by this patch series. > > Please consider this patch series for the next merge window. Bart, How did you test this ? I do not have a zoned UFS drive, so I used an NVMe ZNS drive, which should be fine since the commands in the submission queues of a PCI controller are always handled in order. So I added: diff --git a/drivers/nvme/host/zns.c b/drivers/nvme/host/zns.c index cce4c5b55aa9..36d16b8d3f37 100644 --- a/drivers/nvme/host/zns.c +++ b/drivers/nvme/host/zns.c @@ -108,7 +108,7 @@ int nvme_query_zone_info(struct nvme_ns *ns, unsigned lbaf, void nvme_update_zone_info(struct nvme_ns *ns, struct queue_limits *lim, struct nvme_zone_info *zi) { - lim->features |= BLK_FEAT_ZONED; + lim->features |= BLK_FEAT_ZONED | BLK_FEAT_ORDERED_HWQ; lim->max_open_zones = zi->max_open_zones; lim->max_active_zones = zi->max_active_zones; lim->max_hw_zone_append_sectors = ns->ctrl->max_zone_append; And ran this: fio --name=test --filename=/dev/nvme1n2 --ioengine=io_uring --iodepth=128 \ --direct=1 --bs=4096 --zonemode=zbd --rw=randwrite \ --numjobs=1 And I get unaligned write errors 100% of the time. Looking at your patches again, you are not handling REQ_NOWAIT case in blk_zone_wplug_handle_write(). If you get REQ_NOWAIT BIO, which io_uring will issue, the code goes directly to plugging the BIO, thus bypassing your from_cpu handling. But the same fio command with libaio (no REQ_NOWAIT in that case) also fails. I have not looked further into it yet. -- Damien Le Moal Western Digital Research