On 6/4/25 12:59 PM, John Garry wrote: > On 03/06/2025 16:16, Nilay Shroff wrote: >>>> diff --git a/block/blk-settings.c b/block/blk-settings.c >>>> index a000daafbfb4..35c1354dd5ae 100644 >>>> --- a/block/blk-settings.c >>>> +++ b/block/blk-settings.c >>>> @@ -598,8 +598,14 @@ static bool blk_stack_atomic_writes_head(struct queue_limits *t, >>>> !blk_stack_atomic_writes_boundary_head(t, b)) >>>> return false; >>>> - if (t->io_min <= SECTOR_SIZE) { >>>> - /* No chunk sectors, so use bottom device values directly */ >>>> + if (t->io_min <= SECTOR_SIZE || >>>> + (!(t->atomic_write_hw_unit_max % t->io_min) && >>>> + !(t->atomic_write_hw_unit_min % t->io_min))) { >>> So will this now break md raid0/10 or dm stripe when t->io_min is set (> SECTOR_SIZE)? I mean, for md raid0/10 or dm-stripe, we should be taking the chunk size into account there and we now don't seem to be doing so now. >>> >> Shouldn't it be work good if we ensure that a bottom device atomic write unit min/max are >> aligned with the top device chunk sectors then top device could simply copy and use the >> bottom device atomic write limits directly? > > You need to be more specific when you say "aligned". > I meant to say bottom device atomic write unit min/max are multiples of top device chunk sectors . > Consider chunk sectors for md raid0 is 16KB and b->atomic_write_hw_unit_max is 32KB. Then we must reduce t->atomic_write_hw_unit_max to 16KB (so cannot use the value in b->atomic_write_hw_unit_max directly). Okay, I think I understood your concerns here. > >> Or do we have a special case for raid0/10 and >> dm-strip which can't handle atomic write if chunk size for stacked device is greater than >> SECTOR_SIZE? >> >> BTW there's a typo in the above change, we should have the above if check written as below >> (my bad): >> if (t->io_min <= SECTOR_SIZE || >> (!(b->atomic_write_hw_unit_max % t->io_min) && >> !(b->atomic_write_hw_unit_min % t->io_min))) { >> ... >> ... >> >>> What is the value of top device io_min and physical_block_size in your example? >> The NVme disk which I am using has both t->io_min and t->physical_block_size set >> to 4096. > > I need to test further, but maybe we can change the check to this: > > if (t->io_min <= SECTOR_SIZE || t->io_min == t->physical_block_size) { > /* No chunk sectors, so use bottom device values directly */ > t->atomic_write_hw_unit_max = b->atomic_write_hw_unit_max; > t->atomic_write_hw_unit_min = b->atomic_write_hw_unit_min; > t->atomic_write_hw_max = b->atomic_write_hw_max; > return true; > } How about instead adding a new BLK_FEAT_STRIPED flag and then use it here while setting atomic limits as shown below: diff --git a/block/blk-settings.c b/block/blk-settings.c index a000daafbfb4..bf5d35282d42 100644 --- a/block/blk-settings.c +++ b/block/blk-settings.c @@ -598,8 +598,14 @@ static bool blk_stack_atomic_writes_head(struct queue_limits *t, !blk_stack_atomic_writes_boundary_head(t, b)) return false; - if (t->io_min <= SECTOR_SIZE) { - /* No chunk sectors, so use bottom device values directly */ + if (t->io_min <= SECTOR_SIZE || !(t->features & BLK_FEAT_STRIPED)) { + /* + * If there are no chunk sectors, or if the top device does not + * advertise the STRIPED feature (i.e., it's not a striped + * device like md-raid0 or dm-stripe), then we directly inherit + * the atomic write capabilities from the underlying (bottom) + * device. + */ t->atomic_write_hw_unit_max = b->atomic_write_hw_unit_max; t->atomic_write_hw_unit_min = b->atomic_write_hw_unit_min; t->atomic_write_hw_max = b->atomic_write_hw_max; I tested the above change with md-raid0 and dm-strip setup and seems to be working well. What do you think? Thanks, --Nilay