On 03/06/2025 12:27, Nilay Shroff wrote:
The current logic applies the bottom device's atomic write limits to
the stacked (top) device only if the top device does not support chunk
sectors. However, this approach is too restrictive.
Note that the assumption is that md raid0/10 or dm-stripe chunk size is
in io_min. It's not ideal, but that's the field which always holds
chunk_sectors for those drivers.
max_hw_sectors would seem to be more appropriate to me...
We should also propagate the bottom device's atomic write limits to the
stacked device if atomic_write_hw_unit_{min,max} of the bottom device
are aligned with the top device's chunk size (io_min). Failing to do so
may unnecessarily reduce the stacked device's atomic write limits to
values much smaller than what the hardware can actually support.
For example, on an NVMe disk with the following controller capability:
$ nvme id-ctrl /dev/nvme1 | grep awupf
awupf : 63
Without the patch applied,
The bottom device (nvme1c1n1) atomic queue limits:
$ /sys/block/nvme1c1n1/queue/atomic_write_boundary_bytes:0
$ /sys/block/nvme1c1n1/queue/atomic_write_max_bytes:262144
$ /sys/block/nvme1c1n1/queue/atomic_write_unit_max_bytes:262144
$ /sys/block/nvme1c1n1/queue/atomic_write_unit_min_bytes:4096
The top device (nvme1n1) atomic queue limits:
$ /sys/block/nvme1n1/queue/atomic_write_boundary_bytes:0
$ /sys/block/nvme1n1/queue/atomic_write_max_bytes:4096
$ /sys/block/nvme1n1/queue/atomic_write_unit_max_bytes:4096
$ /sys/block/nvme1n1/queue/atomic_write_unit_min_bytes:4096
With this patch applied,
The top device (nvme1n1) atomic queue limits:
/sys/block/nvme1n1/queue/atomic_write_boundary_bytes:0
/sys/block/nvme1n1/queue/atomic_write_max_bytes:262144
/sys/block/nvme1n1/queue/atomic_write_unit_max_bytes:262144
/sys/block/nvme1n1/queue/atomic_write_unit_min_bytes:4096
This change ensures that the stacked device retains optimal atomic write
capability when alignment permits, improving overall performance and
correctness.
Reported-by: Ojaswin Mujoo <ojaswin@xxxxxxxxxxxxx>
Fixes: d7f36dc446e8 ("block: Support atomic writes limits for stacked devices")
Signed-off-by: Nilay Shroff <nilay@xxxxxxxxxxxxx>
---
block/blk-settings.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/block/blk-settings.c b/block/blk-settings.c
index a000daafbfb4..35c1354dd5ae 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -598,8 +598,14 @@ static bool blk_stack_atomic_writes_head(struct queue_limits *t,
!blk_stack_atomic_writes_boundary_head(t, b))
return false;
- if (t->io_min <= SECTOR_SIZE) {
- /* No chunk sectors, so use bottom device values directly */
+ if (t->io_min <= SECTOR_SIZE ||
+ (!(t->atomic_write_hw_unit_max % t->io_min) &&
+ !(t->atomic_write_hw_unit_min % t->io_min))) {
So will this now break md raid0/10 or dm stripe when t->io_min is set (>
SECTOR_SIZE)? I mean, for md raid0/10 or dm-stripe, we should be taking
the chunk size into account there and we now don't seem to be doing so now.
What is the value of top device io_min and physical_block_size in your
example?
+ /*
+ * If there are no chunk sectors, or if b->atomic_write_hw_unit
+ * _{min, max} are aligned to the chunk size (t->io_min), then
+ * use the bottom device's values directly.
+ */
t->atomic_write_hw_unit_max = b->atomic_write_hw_unit_max;
t->atomic_write_hw_unit_min = b->atomic_write_hw_unit_min;
t->atomic_write_hw_max = b->atomic_write_hw_max;