Re: [PATCH 2/2] iomap: align writeback to RAID stripe boundaries

Dave Chinner <david@xxxxxxxxxxxxx> · Wed, 30 Jul 2025 10:52:04 +1000

On Tue, Jul 29, 2025 at 12:13:42PM -0400, Tony Battersby wrote:
> Improve writeback performance to RAID-4/5/6 by aligning writes to stripe
> boundaries.  This relies on io_opt being set to the stripe size (or
> a multiple) when BLK_FEAT_RAID_PARTIAL_STRIPES_EXPENSIVE is set.

This is the wrong layer to be pulling filesystem write alignments
from.

Filesystems already have alignment information in their on-disk
formats. XFS has stripe unit and stripe width information in the
filesysetm superblock that is set by mkfs.xfs.

This information comes from the block device io-opt/io-min values
exposed to userspace at mkfs time, so the filesystem already knows
what the optimal IO alignment parameters are for the storage stack
underneath it.

Indeed, we already align extent allocations to these parameters, so
aligning filesystem writeback to the same configured alignment makes
a lot more sense than pulling random stuff from block devices during
IO submission...

> @@ -1685,81 +1685,118 @@ static int iomap_add_to_ioend(struct iomap_writepage_ctx *wpc,
>  		struct inode *inode, loff_t pos, loff_t end_pos,
>  		unsigned len)
>  {
> -	struct iomap_folio_state *ifs = folio->private;
> -	size_t poff = offset_in_folio(folio, pos);
> -	unsigned int ioend_flags = 0;
> -	int error;
> -
> -	if (wpc->iomap.type == IOMAP_UNWRITTEN)
> -		ioend_flags |= IOMAP_IOEND_UNWRITTEN;
> -	if (wpc->iomap.flags & IOMAP_F_SHARED)
> -		ioend_flags |= IOMAP_IOEND_SHARED;
> -	if (folio_test_dropbehind(folio))
> -		ioend_flags |= IOMAP_IOEND_DONTCACHE;
> -	if (pos == wpc->iomap.offset && (wpc->iomap.flags & IOMAP_F_BOUNDARY))
> -		ioend_flags |= IOMAP_IOEND_BOUNDARY;
> +	struct queue_limits *lim = bdev_limits(wpc->iomap.bdev);
> +	unsigned int io_align =
> +		(lim->features & BLK_FEAT_RAID_PARTIAL_STRIPES_EXPENSIVE) ?
> +		lim->io_opt >> SECTOR_SHIFT : 0;

i.e. this alignment should come from the filesystem, not the block
device.

-Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx