Re: [PATCH] block: fix atomic write limits for stacked devices

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 6/3/25 5:47 PM, John Garry wrote:
> On 03/06/2025 12:27, Nilay Shroff wrote:
>> The current logic applies the bottom device's atomic write limits to
>> the stacked (top) device only if the top device does not support chunk
>> sectors. However, this approach is too restrictive.
> 
> Note that the assumption is that md raid0/10 or dm-stripe chunk size is in io_min. It's not ideal, but that's the field which always holds chunk_sectors for those drivers.
> 
> max_hw_sectors would seem to be more appropriate to me...
> 
>>
>> We should also propagate the bottom device's atomic write limits to the
>> stacked device if atomic_write_hw_unit_{min,max} of the bottom device
>> are aligned with the top device's chunk size (io_min). Failing to do so
>> may unnecessarily reduce the stacked device's atomic write limits to
>> values much smaller than what the hardware can actually support.
>>
>> For example, on an NVMe disk with the following controller capability:
>> $ nvme id-ctrl /dev/nvme1  | grep awupf
>> awupf     : 63
>>
>> Without the patch applied,
>>
>> The bottom device (nvme1c1n1) atomic queue limits:
>> $ /sys/block/nvme1c1n1/queue/atomic_write_boundary_bytes:0
>> $ /sys/block/nvme1c1n1/queue/atomic_write_max_bytes:262144
>> $ /sys/block/nvme1c1n1/queue/atomic_write_unit_max_bytes:262144
>> $ /sys/block/nvme1c1n1/queue/atomic_write_unit_min_bytes:4096
>>
>> The top device (nvme1n1) atomic queue limits:
>> $ /sys/block/nvme1n1/queue/atomic_write_boundary_bytes:0
>> $ /sys/block/nvme1n1/queue/atomic_write_max_bytes:4096
>> $ /sys/block/nvme1n1/queue/atomic_write_unit_max_bytes:4096
>> $ /sys/block/nvme1n1/queue/atomic_write_unit_min_bytes:4096
>>
>> With this patch applied,
>>
>> The top device (nvme1n1) atomic queue limits:
>> /sys/block/nvme1n1/queue/atomic_write_boundary_bytes:0
>> /sys/block/nvme1n1/queue/atomic_write_max_bytes:262144
>> /sys/block/nvme1n1/queue/atomic_write_unit_max_bytes:262144
>> /sys/block/nvme1n1/queue/atomic_write_unit_min_bytes:4096
>>
>> This change ensures that the stacked device retains optimal atomic write
>> capability when alignment permits, improving overall performance and
>> correctness.
>>
>> Reported-by: Ojaswin Mujoo <ojaswin@xxxxxxxxxxxxx>
>> Fixes: d7f36dc446e8 ("block: Support atomic writes limits for stacked devices")
>> Signed-off-by: Nilay Shroff <nilay@xxxxxxxxxxxxx>
>> ---
>>   block/blk-settings.c | 10 ++++++++--
>>   1 file changed, 8 insertions(+), 2 deletions(-)
>>
>> diff --git a/block/blk-settings.c b/block/blk-settings.c
>> index a000daafbfb4..35c1354dd5ae 100644
>> --- a/block/blk-settings.c
>> +++ b/block/blk-settings.c
>> @@ -598,8 +598,14 @@ static bool blk_stack_atomic_writes_head(struct queue_limits *t,
>>           !blk_stack_atomic_writes_boundary_head(t, b))
>>           return false;
>>   -    if (t->io_min <= SECTOR_SIZE) {
>> -        /* No chunk sectors, so use bottom device values directly */
>> +    if (t->io_min <= SECTOR_SIZE ||
>> +        (!(t->atomic_write_hw_unit_max % t->io_min) &&
>> +         !(t->atomic_write_hw_unit_min % t->io_min))) {
> 
> So will this now break md raid0/10 or dm stripe when t->io_min is set (> SECTOR_SIZE)? I mean, for md raid0/10 or dm-stripe, we should be taking the chunk size into account there and we now don't seem to be doing so now.
> 
Shouldn't it be work good if we ensure that a bottom device atomic write unit min/max are
aligned with the top device chunk sectors then top device could simply copy and use the
bottom device atomic write limits directly? Or do we have a special case for raid0/10 and
dm-strip which can't handle atomic write if chunk size for stacked device is greater than
SECTOR_SIZE?

BTW there's a typo in the above change, we should have the above if check written as below
(my bad):
    if (t->io_min <= SECTOR_SIZE ||
        (!(b->atomic_write_hw_unit_max % t->io_min) &&
         !(b->atomic_write_hw_unit_min % t->io_min))) {
    ...
    ...

> What is the value of top device io_min and physical_block_size in your example?
The NVme disk which I am using has both t->io_min and t->physical_block_size set 
to 4096.

Thanks,
--Nilay





[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux