On 6/3/25 5:47 PM, John Garry wrote: > On 03/06/2025 12:27, Nilay Shroff wrote: >> The current logic applies the bottom device's atomic write limits to >> the stacked (top) device only if the top device does not support chunk >> sectors. However, this approach is too restrictive. > > Note that the assumption is that md raid0/10 or dm-stripe chunk size is in io_min. It's not ideal, but that's the field which always holds chunk_sectors for those drivers. > > max_hw_sectors would seem to be more appropriate to me... > >> >> We should also propagate the bottom device's atomic write limits to the >> stacked device if atomic_write_hw_unit_{min,max} of the bottom device >> are aligned with the top device's chunk size (io_min). Failing to do so >> may unnecessarily reduce the stacked device's atomic write limits to >> values much smaller than what the hardware can actually support. >> >> For example, on an NVMe disk with the following controller capability: >> $ nvme id-ctrl /dev/nvme1 | grep awupf >> awupf : 63 >> >> Without the patch applied, >> >> The bottom device (nvme1c1n1) atomic queue limits: >> $ /sys/block/nvme1c1n1/queue/atomic_write_boundary_bytes:0 >> $ /sys/block/nvme1c1n1/queue/atomic_write_max_bytes:262144 >> $ /sys/block/nvme1c1n1/queue/atomic_write_unit_max_bytes:262144 >> $ /sys/block/nvme1c1n1/queue/atomic_write_unit_min_bytes:4096 >> >> The top device (nvme1n1) atomic queue limits: >> $ /sys/block/nvme1n1/queue/atomic_write_boundary_bytes:0 >> $ /sys/block/nvme1n1/queue/atomic_write_max_bytes:4096 >> $ /sys/block/nvme1n1/queue/atomic_write_unit_max_bytes:4096 >> $ /sys/block/nvme1n1/queue/atomic_write_unit_min_bytes:4096 >> >> With this patch applied, >> >> The top device (nvme1n1) atomic queue limits: >> /sys/block/nvme1n1/queue/atomic_write_boundary_bytes:0 >> /sys/block/nvme1n1/queue/atomic_write_max_bytes:262144 >> /sys/block/nvme1n1/queue/atomic_write_unit_max_bytes:262144 >> /sys/block/nvme1n1/queue/atomic_write_unit_min_bytes:4096 >> >> This change ensures that the stacked device retains optimal atomic write >> capability when alignment permits, improving overall performance and >> correctness. >> >> Reported-by: Ojaswin Mujoo <ojaswin@xxxxxxxxxxxxx> >> Fixes: d7f36dc446e8 ("block: Support atomic writes limits for stacked devices") >> Signed-off-by: Nilay Shroff <nilay@xxxxxxxxxxxxx> >> --- >> block/blk-settings.c | 10 ++++++++-- >> 1 file changed, 8 insertions(+), 2 deletions(-) >> >> diff --git a/block/blk-settings.c b/block/blk-settings.c >> index a000daafbfb4..35c1354dd5ae 100644 >> --- a/block/blk-settings.c >> +++ b/block/blk-settings.c >> @@ -598,8 +598,14 @@ static bool blk_stack_atomic_writes_head(struct queue_limits *t, >> !blk_stack_atomic_writes_boundary_head(t, b)) >> return false; >> - if (t->io_min <= SECTOR_SIZE) { >> - /* No chunk sectors, so use bottom device values directly */ >> + if (t->io_min <= SECTOR_SIZE || >> + (!(t->atomic_write_hw_unit_max % t->io_min) && >> + !(t->atomic_write_hw_unit_min % t->io_min))) { > > So will this now break md raid0/10 or dm stripe when t->io_min is set (> SECTOR_SIZE)? I mean, for md raid0/10 or dm-stripe, we should be taking the chunk size into account there and we now don't seem to be doing so now. > Shouldn't it be work good if we ensure that a bottom device atomic write unit min/max are aligned with the top device chunk sectors then top device could simply copy and use the bottom device atomic write limits directly? Or do we have a special case for raid0/10 and dm-strip which can't handle atomic write if chunk size for stacked device is greater than SECTOR_SIZE? BTW there's a typo in the above change, we should have the above if check written as below (my bad): if (t->io_min <= SECTOR_SIZE || (!(b->atomic_write_hw_unit_max % t->io_min) && !(b->atomic_write_hw_unit_min % t->io_min))) { ... ... > What is the value of top device io_min and physical_block_size in your example? The NVme disk which I am using has both t->io_min and t->physical_block_size set to 4096. Thanks, --Nilay