Re: [PATCH RFC 4/4] block: use chunk_sectors when evaluating stacked atomic write limits

Nilay Shroff <nilay@xxxxxxxxxxxxx> · Fri, 6 Jun 2025 20:53:16 +0530

On 6/5/25 8:38 PM, John Garry wrote:
> The atomic write unit max is limited by any stack device stripe size.
> 
> It is required that the atomic write unit is a power-of-2 factor of the
> stripe size.
> 
> Currently we use io_min limit to hold the stripe size, and check for a
> io_min <= SECTOR_SIZE when deciding if we have a striped stacked device.
> 
> Nilay reports that this causes a problem when the physical block size is
> greater than SECTOR_SIZE [0].
> 
> Furthermore, io_min may be mutated when stacking devices, and this makes
> it a poor candidate to hold the stripe size. Such an example would be
> when the io_min is less than the physical block size.
> 
> Use chunk_sectors to hold the stripe size, which is more appropriate.
> 
> [0] https://lore.kernel.org/linux-block/888f3b1d-7817-4007-b3b3-1a2ea04df771@xxxxxxxxxxxxx/T/#mecca17129f72811137d3c2f1e477634e77f06781
> 
> Signed-off-by: John Garry <john.g.garry@xxxxxxxxxx>
> ---
>  block/blk-settings.c | 8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/block/blk-settings.c b/block/blk-settings.c
> index a000daafbfb4..5b0f1a854e81 100644
> --- a/block/blk-settings.c
> +++ b/block/blk-settings.c
> @@ -594,11 +594,13 @@ static bool blk_stack_atomic_writes_boundary_head(struct queue_limits *t,
>  static bool blk_stack_atomic_writes_head(struct queue_limits *t,
>  				struct queue_limits *b)
>  {
> +	unsigned int chunk_size = t->chunk_sectors << SECTOR_SHIFT;
> +
>  	if (b->atomic_write_hw_boundary &&
>  	    !blk_stack_atomic_writes_boundary_head(t, b))
>  		return false;
>  
> -	if (t->io_min <= SECTOR_SIZE) {
> +	if (!t->chunk_sectors) {
>  		/* No chunk sectors, so use bottom device values directly */
>  		t->atomic_write_hw_unit_max = b->atomic_write_hw_unit_max;
>  		t->atomic_write_hw_unit_min = b->atomic_write_hw_unit_min;
> @@ -617,12 +619,12 @@ static bool blk_stack_atomic_writes_head(struct queue_limits *t,
>  	 * aligned with both limits, i.e. 8K in this example.
>  	 */
>  	t->atomic_write_hw_unit_max = b->atomic_write_hw_unit_max;
> -	while (t->io_min % t->atomic_write_hw_unit_max)
> +	while (chunk_size % t->atomic_write_hw_unit_max)
>  		t->atomic_write_hw_unit_max /= 2;
>  
>  	t->atomic_write_hw_unit_min = min(b->atomic_write_hw_unit_min,
>  					  t->atomic_write_hw_unit_max);
> -	t->atomic_write_hw_max = min(b->atomic_write_hw_max, t->io_min);
> +	t->atomic_write_hw_max = min(b->atomic_write_hw_max, chunk_size);
>  
>  	return true;
>  }

This works well with my NVMe disk which supports atomic writes however the only
concern is what if in case t->chunk_sectors is also defined for NVMe disk? 
I see that nvme_set_chunk_sectors() initializes the chunk_sectors for NVMe. 
The value which is assigned to lim->chunk_sectors in nvme_set_chunk_sectors()
represents "noiob" (i.e. Namespace Optimal I/O Boundary). My disk has "noiob" 
set to zero but in case if it's non-zero then would it break the above logic
for NVMe atomic writes?

Thanks,
--Nilay