Re: [RFC] gfs2: Do not call iomap_zero_range beyond eof

Brian Foster <bfoster@xxxxxxxxxx> · Thu, 8 May 2025 12:34:33 -0400

On Thu, May 08, 2025 at 09:04:22AM -0700, Darrick J. Wong wrote:
> On Thu, May 08, 2025 at 11:19:37AM -0400, Brian Foster wrote:
> > On Thu, May 08, 2025 at 08:04:46AM -0700, Darrick J. Wong wrote:
> > > On Thu, May 08, 2025 at 10:30:30AM -0400, Brian Foster wrote:
> > > > On Thu, May 08, 2025 at 03:34:27PM +0200, Andreas Gruenbacher wrote:
> > > > > Since commit eb65540aa9fc ("iomap: warn on zero range of a post-eof
> > > > > folio"), iomap_zero_range() warns when asked to zero a folio beyond eof.
> > > > > The warning triggers on the following code path:
> > > 
> > > Which warning?  This one?
> > > 
> > > 	/* warn about zeroing folios beyond eof that won't write back */
> > > 	WARN_ON_ONCE(folio_pos(folio) > iter->inode->i_size);
> > > 
> > > If so, then why are there folios that start entirely beyond EOF?
> > > 
> > 
> > Yeah.. this gfs2 instance is simply a case of their punch hole mechanism
> > does unconditional partial folio zeroing via iomap zero range, so if a
> > punch hole occurs on some unaligned range of post-eof blocks, it will
> > basically create and perform zeroing of post-eof folios. IIUC the caveat
> > here is that these blocks are all zeroed on alloc (unwritten extents are
> > apparently not a thing in gfs2), so the punch time zeroing and warning
> > are spurious. Andreas can correct me if I have any of that wrong.
> 
> Oh, right, because iomap_zero_iter calls iomap_write_begin, which
> allocates a new folio completely beyond EOF, and then we see that new
> folio and WARN about it before scribbling on the folio and dirtying it.
> Correct?
> 
> If so then yeah, it doesn't seem useful to do that... unless the file
> size immediately gets extended such that at least one byte of the dirty
> folio is within EOF.  Even then, that seems like a stretch...
> 

Yep, agreed. An i_size update after a zero op would come after locks are
dropped and whatnot as well, so seems racy and wrong.

> > > > > 
> > > > >   gfs2_fallocate(FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE)
> > > > >     __gfs2_punch_hole()
> > > > >       gfs2_block_zero_range()
> > > > >         iomap_zero_range()
> > > > > 
> > > > > So far, gfs2 is just zeroing out partial pages at the beginning and end
> > > > > of the range, whether beyond eof or not.  The data beyond eof is already
> > > > > expected to be all zeroes, though.  Truncate the range passed to
> > > > > iomap_zero_range().
> > > > > 
> > > > > As an alternative approach, we could also implicitly truncate the range
> > > > > inside iomap_zero_range() instead of issuing a warning.  Any thoughts?
> > > > > 
> > > > 
> > > > Thanks Andreas. The more I think about this the more it seems like
> > > > lifting this logic into iomap is a reasonable compromise between just
> > > > dropping the warning and forcing individual filesystems to work around
> > > > it. The original intent of the warning was to have something to catch
> > > > subtle bad behavior since zero range did update i_size for so long.
> > > > 
> > > > OTOH I think it's reasonable to argue that we shouldn't need to warn in
> > > > situations where we could just enforce correct behavior. Also, I believe
> > > > we introduced something similar to avoid post-eof weirdness wrt unshare
> > > > range [1], so precedent exists.
> > > > 
> > > > I'm interested if others have opinions on the iomap side.. (though as I
> > > > write this it looks like hch sits on the side of not tweaking iomap).
> > > 
> > > IIRC XFS calls iomap_zero_range during file extending operations to zero
> > > the tail of a folio that spans EOF, so you'd have to allow for that too.
> > > 
> > 
> > Yeah, good point. Perhaps we'd want to bail on a folio that starts
> > beyond EOF with this approach, similar to the warning logic.
> 
> ...because I don't see much use in zeroing and dirtying a folio that
> starts well beyond EOF since iomap_writepage_handle_eof will ignore it
> and there are several gigantic comments in buffered-io.c about clamping
> to EOF.
> 
> <shrug> But maybe I'm missing a usecase?
> 

Not that I'm aware of.

Brian

> --D
> 
> > Brian
> > 
> > > --D
> > > 
> > > > Brian
> > > > 
> > > > [1] a311a08a4237 ("iomap: constrain the file range passed to iomap_file_unshare")
> > > > 
> > > > > Thanks,
> > > > > Andreas
> > > > > 
> > > > > --
> > > > > 
> > > > > Signed-off-by: Andreas Gruenbacher <agruenba@xxxxxxxxxx>
> > > > > 
> > > > > diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
> > > > > index b81984def58e..d9a4309cd414 100644
> > > > > --- a/fs/gfs2/bmap.c
> > > > > +++ b/fs/gfs2/bmap.c
> > > > > @@ -1301,6 +1301,10 @@ static int gfs2_block_zero_range(struct inode *inode, loff_t from,
> > > > >  				 unsigned int length)
> > > > >  {
> > > > >  	BUG_ON(current->journal_info);
> > > > > +	if (from > inode->i_size)
> > > > > +		return 0;
> > > > > +	if (from + length > inode->i_size)
> > > > > +		length = inode->i_size - from;
> > > > >  	return iomap_zero_range(inode, from, length, NULL, &gfs2_iomap_ops,
> > > > >  			NULL);
> > > > >  }
> > > > > 
> > > > 
> > > > 
> > > 
> > 
> > 
>