On Thu, May 8, 2025 at 6:04 PM Darrick J. Wong <djwong@xxxxxxxxxx> wrote: > On Thu, May 08, 2025 at 11:19:37AM -0400, Brian Foster wrote: > > On Thu, May 08, 2025 at 08:04:46AM -0700, Darrick J. Wong wrote: > > > On Thu, May 08, 2025 at 10:30:30AM -0400, Brian Foster wrote: > > > > On Thu, May 08, 2025 at 03:34:27PM +0200, Andreas Gruenbacher wrote: > > > > > Since commit eb65540aa9fc ("iomap: warn on zero range of a post-eof > > > > > folio"), iomap_zero_range() warns when asked to zero a folio beyond eof. > > > > > The warning triggers on the following code path: > > > > > > Which warning? This one? > > > > > > /* warn about zeroing folios beyond eof that won't write back */ > > > WARN_ON_ONCE(folio_pos(folio) > iter->inode->i_size); > > > > > > If so, then why are there folios that start entirely beyond EOF? > > > > > > > Yeah.. this gfs2 instance is simply a case of their punch hole mechanism > > does unconditional partial folio zeroing via iomap zero range, so if a > > punch hole occurs on some unaligned range of post-eof blocks, it will > > basically create and perform zeroing of post-eof folios. IIUC the caveat > > here is that these blocks are all zeroed on alloc (unwritten extents are > > apparently not a thing in gfs2), so the punch time zeroing and warning > > are spurious. Andreas can correct me if I have any of that wrong. gfs2 uses ext2-style indirect blocks. It doesn't have extents, delayed allocation, or any kind of unwritten data tracking. > Oh, right, because iomap_zero_iter calls iomap_write_begin, which > allocates a new folio completely beyond EOF, and then we see that new > folio and WARN about it before scribbling on the folio and dirtying it. > Correct? Yes, or at least that's also my understanding. > If so then yeah, it doesn't seem useful to do that... unless the file > size immediately gets extended such that at least one byte of the dirty > folio is within EOF. Even then, that seems like a stretch... i_size isn't going to change in a punch hole operation: int vfs_fallocate(struct file *file, int mode, loff_t offset, loff_t len) { [...] switch (mode & FALLOC_FL_MODE_MASK) { [...] case FALLOC_FL_PUNCH_HOLE: if (!(mode & FALLOC_FL_KEEP_SIZE)) return -EOPNOTSUPP; > > > > > gfs2_fallocate(FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE) > > > > > __gfs2_punch_hole() > > > > > gfs2_block_zero_range() > > > > > iomap_zero_range() > > > > > > > > > > So far, gfs2 is just zeroing out partial pages at the beginning and end > > > > > of the range, whether beyond eof or not. The data beyond eof is already > > > > > expected to be all zeroes, though. Truncate the range passed to > > > > > iomap_zero_range(). > > > > > > > > > > As an alternative approach, we could also implicitly truncate the range > > > > > inside iomap_zero_range() instead of issuing a warning. Any thoughts? > > > > > > > > > > > > > Thanks Andreas. The more I think about this the more it seems like > > > > lifting this logic into iomap is a reasonable compromise between just > > > > dropping the warning and forcing individual filesystems to work around > > > > it. The original intent of the warning was to have something to catch > > > > subtle bad behavior since zero range did update i_size for so long. > > > > > > > > OTOH I think it's reasonable to argue that we shouldn't need to warn in > > > > situations where we could just enforce correct behavior. Also, I believe > > > > we introduced something similar to avoid post-eof weirdness wrt unshare > > > > range [1], so precedent exists. > > > > > > > > I'm interested if others have opinions on the iomap side.. (though as I > > > > write this it looks like hch sits on the side of not tweaking iomap). > > > > > > IIRC XFS calls iomap_zero_range during file extending operations to zero > > > the tail of a folio that spans EOF, so you'd have to allow for that too. > > > > > > > Yeah, good point. Perhaps we'd want to bail on a folio that starts > > beyond EOF with this approach, similar to the warning logic. > > ...because I don't see much use in zeroing and dirtying a folio that > starts well beyond EOF since iomap_writepage_handle_eof will ignore it > and there are several gigantic comments in buffered-io.c about clamping > to EOF. > > <shrug> But maybe I'm missing a usecase? Unrelated to iomap_zero_range(), but gfs2 has this fairly scary special case in which it does insist on writing folios beyond eof. See gfs2_write_jdata_folio(); this was introduced and is described in commit fd4c5748b8d3 ("gfs2: writeout truncated pages"). > --D > > > Brian > > > > > --D > > > > > > > Brian > > > > > > > > [1] a311a08a4237 ("iomap: constrain the file range passed to iomap_file_unshare") > > > > > > > > > Thanks, > > > > > Andreas > > > > > > > > > > -- > > > > > > > > > > Signed-off-by: Andreas Gruenbacher <agruenba@xxxxxxxxxx> > > > > > > > > > > diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c > > > > > index b81984def58e..d9a4309cd414 100644 > > > > > --- a/fs/gfs2/bmap.c > > > > > +++ b/fs/gfs2/bmap.c > > > > > @@ -1301,6 +1301,10 @@ static int gfs2_block_zero_range(struct inode *inode, loff_t from, > > > > > unsigned int length) > > > > > { > > > > > BUG_ON(current->journal_info); > > > > > + if (from > inode->i_size) > > > > > + return 0; > > > > > + if (from + length > inode->i_size) > > > > > + length = inode->i_size - from; > > > > > return iomap_zero_range(inode, from, length, NULL, &gfs2_iomap_ops, > > > > > NULL); > > > > > } > > > > > > > > > > > > > > > > > > > > > Thanks, Andreas