Re: [PATCH 5/7] xfs: fill dirty folios on zero range of unwritten mappings

"Darrick J. Wong" <djwong@xxxxxxxxxx> · Wed, 2 Jul 2025 11:50:09 -0700

On Tue, Jun 10, 2025 at 08:24:00AM -0400, Brian Foster wrote:
> On Mon, Jun 09, 2025 at 09:12:19AM -0700, Darrick J. Wong wrote:
> > On Thu, Jun 05, 2025 at 01:33:55PM -0400, Brian Foster wrote:
> > > Use the iomap folio batch mechanism to select folios to zero on zero
> > > range of unwritten mappings. Trim the resulting mapping if the batch
> > > is filled (unlikely for current use cases) to distinguish between a
> > > range to skip and one that requires another iteration due to a full
> > > batch.
> > > 
> > > Signed-off-by: Brian Foster <bfoster@xxxxxxxxxx>
> > > ---
> > >  fs/xfs/xfs_iomap.c | 23 +++++++++++++++++++++++
> > >  1 file changed, 23 insertions(+)
> > > 
> > > diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> > > index b5cf5bc6308d..63054f7ead0e 100644
> > > --- a/fs/xfs/xfs_iomap.c
> > > +++ b/fs/xfs/xfs_iomap.c
> ...
> > > @@ -1769,6 +1772,26 @@ xfs_buffered_write_iomap_begin(
> > >  		if (offset_fsb < eof_fsb && end_fsb > eof_fsb)
> > >  			end_fsb = eof_fsb;
> > >  
> > > +		/*
> > > +		 * Look up dirty folios for unwritten mappings within EOF.
> > > +		 * Providing this bypasses the flush iomap uses to trigger
> > > +		 * extent conversion when unwritten mappings have dirty
> > > +		 * pagecache in need of zeroing.
> > > +		 *
> > > +		 * Trim the mapping to the end pos of the lookup, which in turn
> > > +		 * was trimmed to the end of the batch if it became full before
> > > +		 * the end of the mapping.
> > > +		 */
> > > +		if (imap.br_state == XFS_EXT_UNWRITTEN &&
> > > +		    offset_fsb < eof_fsb) {
> > > +			loff_t len = min(count,
> > > +					 XFS_FSB_TO_B(mp, imap.br_blockcount));
> > > +
> > > +			end = iomap_fill_dirty_folios(iter, offset, len);
> > 
> > ...though I wonder, does this need to happen in
> > xfs_buffered_write_iomap_begin?  Is it required to hold the ILOCK while
> > we go look for folios in the mapping?  Or could this become a part of
> > iomap_write_begin?
> > 
> 
> Technically it does not need to be inside ->iomap_begin(). The "dirty
> check" just needs to be before the fs drops its own locks associated
> with the mapping lookup to maintain functional correctness, and that
> includes doing it before the callout in the first place (i.e. this is
> how the filemap_range_needs_writeback() logic works). I have various
> older prototype versions of that work that tried to do things a bit more
> generically in that way, but ultimately they seemed less elegant for the
> purpose of zero range.
> 
> WRT zero range, the main reason this is in the callback is that it's
> only required to search for dirty folios when the underlying mapping is
> unwritten, and we don't know that until the filesystem provides the
> mapping (and doing at after the fs drops locks is racy).

<nod>

> That said, if we eventually use this for something like buffered writes,
> that is not so much of an issue and we probably want to instead
> lookup/allocate/lock each successive folio up front. That could likely
> occur at the iomap level (lock ordering issues and whatnot
> notwithstanding).
> 
> The one caveat with zero range is that it's only really used for small
> ranges in practice, so it may not really be that big of a deal if the
> folio lookup occurred unconditionally. I think the justification for
> that is tied to broader using of batching in iomap, however, so I don't
> really want to force the issue unless it proves worthwhile. IOW what I'm
> trying to say is that if we do end up with a few more ops using this
> mechanism, it wouldn't surprise me if we just decided to deduplicate to
> the lowest common denominator implementation at that point (and do the
> lookups in iomap iter or something). We're just not there yet IMO.

<nod> I suppose it could be useful for performance reasons to try to
grab as many folios as we can while we still hold the ILOCK, though we'd
have to be careful about lock inversions.

--D

> 
> Brian
> 
> > --D
> > 
> > > +			end_fsb = min_t(xfs_fileoff_t, end_fsb,
> > > +					XFS_B_TO_FSB(mp, end));
> > > +		}
> > > +
> > >  		xfs_trim_extent(&imap, offset_fsb, end_fsb - offset_fsb);
> > >  	}
> > >  
> > > -- 
> > > 2.49.0
> > > 
> > > 
> > 
> 
>