Re: [RFC] gfs2: Do not call iomap_zero_range beyond eof

Brian Foster <bfoster@xxxxxxxxxx> · Thu, 8 May 2025 11:19:37 -0400

On Thu, May 08, 2025 at 08:04:46AM -0700, Darrick J. Wong wrote:
> On Thu, May 08, 2025 at 10:30:30AM -0400, Brian Foster wrote:
> > On Thu, May 08, 2025 at 03:34:27PM +0200, Andreas Gruenbacher wrote:
> > > Since commit eb65540aa9fc ("iomap: warn on zero range of a post-eof
> > > folio"), iomap_zero_range() warns when asked to zero a folio beyond eof.
> > > The warning triggers on the following code path:
> 
> Which warning?  This one?
> 
> 	/* warn about zeroing folios beyond eof that won't write back */
> 	WARN_ON_ONCE(folio_pos(folio) > iter->inode->i_size);
> 
> If so, then why are there folios that start entirely beyond EOF?
> 

Yeah.. this gfs2 instance is simply a case of their punch hole mechanism
does unconditional partial folio zeroing via iomap zero range, so if a
punch hole occurs on some unaligned range of post-eof blocks, it will
basically create and perform zeroing of post-eof folios. IIUC the caveat
here is that these blocks are all zeroed on alloc (unwritten extents are
apparently not a thing in gfs2), so the punch time zeroing and warning
are spurious. Andreas can correct me if I have any of that wrong.

> > > 
> > >   gfs2_fallocate(FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE)
> > >     __gfs2_punch_hole()
> > >       gfs2_block_zero_range()
> > >         iomap_zero_range()
> > > 
> > > So far, gfs2 is just zeroing out partial pages at the beginning and end
> > > of the range, whether beyond eof or not.  The data beyond eof is already
> > > expected to be all zeroes, though.  Truncate the range passed to
> > > iomap_zero_range().
> > > 
> > > As an alternative approach, we could also implicitly truncate the range
> > > inside iomap_zero_range() instead of issuing a warning.  Any thoughts?
> > > 
> > 
> > Thanks Andreas. The more I think about this the more it seems like
> > lifting this logic into iomap is a reasonable compromise between just
> > dropping the warning and forcing individual filesystems to work around
> > it. The original intent of the warning was to have something to catch
> > subtle bad behavior since zero range did update i_size for so long.
> > 
> > OTOH I think it's reasonable to argue that we shouldn't need to warn in
> > situations where we could just enforce correct behavior. Also, I believe
> > we introduced something similar to avoid post-eof weirdness wrt unshare
> > range [1], so precedent exists.
> > 
> > I'm interested if others have opinions on the iomap side.. (though as I
> > write this it looks like hch sits on the side of not tweaking iomap).
> 
> IIRC XFS calls iomap_zero_range during file extending operations to zero
> the tail of a folio that spans EOF, so you'd have to allow for that too.
> 

Yeah, good point. Perhaps we'd want to bail on a folio that starts
beyond EOF with this approach, similar to the warning logic.

Brian

> --D
> 
> > Brian
> > 
> > [1] a311a08a4237 ("iomap: constrain the file range passed to iomap_file_unshare")
> > 
> > > Thanks,
> > > Andreas
> > > 
> > > --
> > > 
> > > Signed-off-by: Andreas Gruenbacher <agruenba@xxxxxxxxxx>
> > > 
> > > diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
> > > index b81984def58e..d9a4309cd414 100644
> > > --- a/fs/gfs2/bmap.c
> > > +++ b/fs/gfs2/bmap.c
> > > @@ -1301,6 +1301,10 @@ static int gfs2_block_zero_range(struct inode *inode, loff_t from,
> > >  				 unsigned int length)
> > >  {
> > >  	BUG_ON(current->journal_info);
> > > +	if (from > inode->i_size)
> > > +		return 0;
> > > +	if (from + length > inode->i_size)
> > > +		length = inode->i_size - from;
> > >  	return iomap_zero_range(inode, from, length, NULL, &gfs2_iomap_ops,
> > >  			NULL);
> > >  }
> > > 
> > 
> > 
>