Re: [RFC] gfs2: Do not call iomap_zero_range beyond eof

Andreas Gruenbacher <agruenba@xxxxxxxxxx> · Thu, 8 May 2025 18:36:35 +0200

On Thu, May 8, 2025 at 6:04 PM Darrick J. Wong <djwong@xxxxxxxxxx> wrote:
> On Thu, May 08, 2025 at 11:19:37AM -0400, Brian Foster wrote:
> > On Thu, May 08, 2025 at 08:04:46AM -0700, Darrick J. Wong wrote:
> > > On Thu, May 08, 2025 at 10:30:30AM -0400, Brian Foster wrote:
> > > > On Thu, May 08, 2025 at 03:34:27PM +0200, Andreas Gruenbacher wrote:
> > > > > Since commit eb65540aa9fc ("iomap: warn on zero range of a post-eof
> > > > > folio"), iomap_zero_range() warns when asked to zero a folio beyond eof.
> > > > > The warning triggers on the following code path:
> > >
> > > Which warning?  This one?
> > >
> > >     /* warn about zeroing folios beyond eof that won't write back */
> > >     WARN_ON_ONCE(folio_pos(folio) > iter->inode->i_size);
> > >
> > > If so, then why are there folios that start entirely beyond EOF?
> > >
> >
> > Yeah.. this gfs2 instance is simply a case of their punch hole mechanism
> > does unconditional partial folio zeroing via iomap zero range, so if a
> > punch hole occurs on some unaligned range of post-eof blocks, it will
> > basically create and perform zeroing of post-eof folios. IIUC the caveat
> > here is that these blocks are all zeroed on alloc (unwritten extents are
> > apparently not a thing in gfs2), so the punch time zeroing and warning
> > are spurious. Andreas can correct me if I have any of that wrong.

gfs2 uses ext2-style indirect blocks. It doesn't have extents, delayed
allocation, or any kind of unwritten data tracking.

> Oh, right, because iomap_zero_iter calls iomap_write_begin, which
> allocates a new folio completely beyond EOF, and then we see that new
> folio and WARN about it before scribbling on the folio and dirtying it.
> Correct?

Yes, or at least that's also my understanding.

> If so then yeah, it doesn't seem useful to do that... unless the file
> size immediately gets extended such that at least one byte of the dirty
> folio is within EOF.  Even then, that seems like a stretch...

i_size isn't going to change in a punch hole operation:

int vfs_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
{
        [...]
        switch (mode & FALLOC_FL_MODE_MASK) {
        [...]
        case FALLOC_FL_PUNCH_HOLE:
                if (!(mode & FALLOC_FL_KEEP_SIZE))
                        return -EOPNOTSUPP;

> > > > >   gfs2_fallocate(FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE)
> > > > >     __gfs2_punch_hole()
> > > > >       gfs2_block_zero_range()
> > > > >         iomap_zero_range()
> > > > >
> > > > > So far, gfs2 is just zeroing out partial pages at the beginning and end
> > > > > of the range, whether beyond eof or not.  The data beyond eof is already
> > > > > expected to be all zeroes, though.  Truncate the range passed to
> > > > > iomap_zero_range().
> > > > >
> > > > > As an alternative approach, we could also implicitly truncate the range
> > > > > inside iomap_zero_range() instead of issuing a warning.  Any thoughts?
> > > > >
> > > >
> > > > Thanks Andreas. The more I think about this the more it seems like
> > > > lifting this logic into iomap is a reasonable compromise between just
> > > > dropping the warning and forcing individual filesystems to work around
> > > > it. The original intent of the warning was to have something to catch
> > > > subtle bad behavior since zero range did update i_size for so long.
> > > >
> > > > OTOH I think it's reasonable to argue that we shouldn't need to warn in
> > > > situations where we could just enforce correct behavior. Also, I believe
> > > > we introduced something similar to avoid post-eof weirdness wrt unshare
> > > > range [1], so precedent exists.
> > > >
> > > > I'm interested if others have opinions on the iomap side.. (though as I
> > > > write this it looks like hch sits on the side of not tweaking iomap).
> > >
> > > IIRC XFS calls iomap_zero_range during file extending operations to zero
> > > the tail of a folio that spans EOF, so you'd have to allow for that too.
> > >
> >
> > Yeah, good point. Perhaps we'd want to bail on a folio that starts
> > beyond EOF with this approach, similar to the warning logic.
>
> ...because I don't see much use in zeroing and dirtying a folio that
> starts well beyond EOF since iomap_writepage_handle_eof will ignore it
> and there are several gigantic comments in buffered-io.c about clamping
> to EOF.
>
> <shrug> But maybe I'm missing a usecase?

Unrelated to iomap_zero_range(), but gfs2 has this fairly scary
special case in which it does insist on writing folios beyond eof. See
gfs2_write_jdata_folio(); this was introduced and is described in
commit fd4c5748b8d3 ("gfs2: writeout truncated pages").

> --D
>
> > Brian
> >
> > > --D
> > >
> > > > Brian
> > > >
> > > > [1] a311a08a4237 ("iomap: constrain the file range passed to iomap_file_unshare")
> > > >
> > > > > Thanks,
> > > > > Andreas
> > > > >
> > > > > --
> > > > >
> > > > > Signed-off-by: Andreas Gruenbacher <agruenba@xxxxxxxxxx>
> > > > >
> > > > > diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
> > > > > index b81984def58e..d9a4309cd414 100644
> > > > > --- a/fs/gfs2/bmap.c
> > > > > +++ b/fs/gfs2/bmap.c
> > > > > @@ -1301,6 +1301,10 @@ static int gfs2_block_zero_range(struct inode *inode, loff_t from,
> > > > >                                  unsigned int length)
> > > > >  {
> > > > >         BUG_ON(current->journal_info);
> > > > > +       if (from > inode->i_size)
> > > > > +               return 0;
> > > > > +       if (from + length > inode->i_size)
> > > > > +               length = inode->i_size - from;
> > > > >         return iomap_zero_range(inode, from, length, NULL, &gfs2_iomap_ops,
> > > > >                         NULL);
> > > > >  }
> > > > >
> > > >
> > > >
> > >
> >
> >
>

Thanks,
Andreas