On Tue, Apr 15, 2025 at 12:14:20PM +0000, John Garry wrote: > For when large atomic writes (> 1x FS block) are supported, there will be > various occasions when HW offload may not be possible. > > Such instances include: > - unaligned extent mapping wrt write length > - extent mappings which do not cover the full write, e.g. the write spans > sparse or mixed-mapping extents > - the write length is greater than HW offload can support > > In those cases, we need to fallback to the CoW-based atomic write mode. For > this, report special code -ENOPROTOOPT to inform the caller that HW > offload-based method is not possible. > > In addition to the occasions mentioned, if the write covers an unallocated > range, we again judge that we need to rely on the CoW-based method when we > would need to allocate anything more than 1x block. This is because if we > allocate less blocks that is required for the write, then again HW > offload-based method would not be possible. So we are taking a pessimistic > approach to writes covering unallocated space. > > Signed-off-by: John Garry <john.g.garry@xxxxxxxxxx> > Reviewed-by: "Darrick J. Wong" <djwong@xxxxxxxxxx> > --- > fs/xfs/xfs_iomap.c | 65 ++++++++++++++++++++++++++++++++++++++++++++-- > 1 file changed, 63 insertions(+), 2 deletions(-) > > diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c > index 049655ebc3f7..02bb8257ea24 100644 > --- a/fs/xfs/xfs_iomap.c > +++ b/fs/xfs/xfs_iomap.c > @@ -798,6 +798,41 @@ imap_spans_range( > return true; > } > > +static bool > +xfs_bmap_hw_atomic_write_possible( > + struct xfs_inode *ip, > + struct xfs_bmbt_irec *imap, > + xfs_fileoff_t offset_fsb, > + xfs_fileoff_t end_fsb) > +{ > + struct xfs_mount *mp = ip->i_mount; > + xfs_fsize_t len = XFS_FSB_TO_B(mp, end_fsb - offset_fsb); > + > + /* > + * atomic writes are required to be naturally aligned for disk blocks, > + * which ensures that we adhere to block layer rules that we won't > + * straddle any boundary or violate write alignment requirement. > + */ > + if (!IS_ALIGNED(imap->br_startblock, imap->br_blockcount)) > + return false; > + > + /* > + * Spanning multiple extents would mean that multiple BIOs would be > + * issued, and so would lose atomicity required for REQ_ATOMIC-based > + * atomics. > + */ > + if (!imap_spans_range(imap, offset_fsb, end_fsb)) > + return false; > + > + /* > + * The ->iomap_begin caller should ensure this, but check anyway. > + */ > + if (len > xfs_inode_buftarg(ip)->bt_bdev_awu_max) > + return false; This needs to check len against bt_bdev_awu_min so that we don't submit too-short atomic writes to the hardware. Let's say that the hardware minimum is 32k and the fsblock size is 4k. XFS can perform an out of place write for 4k-16k writes, but right now we'll just throw invalid commands at the bdev, and it'll return EINVAL. /me wonders if statx should grow a atomic_write_unit_min_opt field too, unless everyone in block layer land is convinced that awu_min will always match lbasize? (I probably missed that conversation) --D > + > + return true; > +} > + > static int > xfs_direct_write_iomap_begin( > struct inode *inode, > @@ -812,9 +847,11 @@ xfs_direct_write_iomap_begin( > struct xfs_bmbt_irec imap, cmap; > xfs_fileoff_t offset_fsb = XFS_B_TO_FSBT(mp, offset); > xfs_fileoff_t end_fsb = xfs_iomap_end_fsb(mp, offset, length); > + xfs_fileoff_t orig_end_fsb = end_fsb; > int nimaps = 1, error = 0; > bool shared = false; > u16 iomap_flags = 0; > + bool needs_alloc; > unsigned int lockmode; > u64 seq; > > @@ -875,13 +912,37 @@ xfs_direct_write_iomap_begin( > (flags & IOMAP_DIRECT) || IS_DAX(inode)); > if (error) > goto out_unlock; > - if (shared) > + if (shared) { > + if ((flags & IOMAP_ATOMIC) && > + !xfs_bmap_hw_atomic_write_possible(ip, &cmap, > + offset_fsb, end_fsb)) { > + error = -ENOPROTOOPT; > + goto out_unlock; > + } > goto out_found_cow; > + } > end_fsb = imap.br_startoff + imap.br_blockcount; > length = XFS_FSB_TO_B(mp, end_fsb) - offset; > } > > - if (imap_needs_alloc(inode, flags, &imap, nimaps)) > + needs_alloc = imap_needs_alloc(inode, flags, &imap, nimaps); > + > + if (flags & IOMAP_ATOMIC) { > + error = -ENOPROTOOPT; > + /* > + * If we allocate less than what is required for the write > + * then we may end up with multiple extents, which means that > + * REQ_ATOMIC-based cannot be used, so avoid this possibility. > + */ > + if (needs_alloc && orig_end_fsb - offset_fsb > 1) > + goto out_unlock; > + > + if (!xfs_bmap_hw_atomic_write_possible(ip, &imap, offset_fsb, > + orig_end_fsb)) > + goto out_unlock; > + } > + > + if (needs_alloc) > goto allocate_blocks; > > /* > -- > 2.31.1 > >