On Tue, May 27, 2025 at 09:43:42AM -0600, Jens Axboe wrote: > DONTCACHE I/O must have the completion punted to a workqueue, just like > what is done for unwritten extents, as the completion needs task context > to perform the invalidation of the folio(s). However, if writeback is > started off filemap_fdatawrite_range() off generic_sync() and it's an > overwrite, then the DONTCACHE marking gets lost as iomap_add_to_ioend() > don't look at the folio being added and no further state is passed down > to help it know that this is a dropbehind/DONTCACHE write. > > Check if the folio being added is marked as dropbehind, and set > IOMAP_IOEND_DONTCACHE if that is the case. Then XFS can factor this into > the decision making of completion context in xfs_submit_ioend(). > Additionally include this ioend flag in the NOMERGE flags, to avoid > mixing it with unrelated IO. > > This fixes extra page cache being instantiated when the write performed > is an overwrite, rather than newly instantiated blocks. > > Fixes: b2cd5ae693a3 ("iomap: make buffered writes work with RWF_DONTCACHE") > Signed-off-by: Jens Axboe <axboe@xxxxxxxxx> > > --- > > Found this one while testing the unrelated issue of invalidation being a > bit broken before 6.15 release. We need this to ensure that overwrites > also prune correctly, just like unwritten extents currently do. I wondered about the stack traces showing DONTCACHE writeback completion being handled from irq context[*] when I read the -fsdevel thread about broken DONTCACHE functionality yesterday. [*] second trace in the failure reported in this comment: https://lore.kernel.org/linux-fsdevel/432302ad-aa95-44f4-8728-77e61cc1f20c@xxxxxxxxx/ > diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c > index 233abf598f65..3729391a18f3 100644 > --- a/fs/iomap/buffered-io.c > +++ b/fs/iomap/buffered-io.c > @@ -1691,6 +1691,8 @@ static int iomap_add_to_ioend(struct iomap_writepage_ctx *wpc, > ioend_flags |= IOMAP_IOEND_UNWRITTEN; > if (wpc->iomap.flags & IOMAP_F_SHARED) > ioend_flags |= IOMAP_IOEND_SHARED; > + if (folio_test_dropbehind(folio)) > + ioend_flags |= IOMAP_IOEND_DONTCACHE; > if (pos == wpc->iomap.offset && (wpc->iomap.flags & IOMAP_F_BOUNDARY)) > ioend_flags |= IOMAP_IOEND_BOUNDARY; > > diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c > index 26a04a783489..1b7a006402ea 100644 > --- a/fs/xfs/xfs_aops.c > +++ b/fs/xfs/xfs_aops.c > @@ -436,6 +436,9 @@ xfs_map_blocks( > return 0; > } > > +#define IOEND_WQ_FLAGS (IOMAP_IOEND_UNWRITTEN | IOMAP_IOEND_SHARED | \ > + IOMAP_IOEND_DONTCACHE) > + > static int > xfs_submit_ioend( > struct iomap_writepage_ctx *wpc, > @@ -460,8 +463,7 @@ xfs_submit_ioend( > memalloc_nofs_restore(nofs_flag); > > /* send ioends that might require a transaction to the completion wq */ > - if (xfs_ioend_is_append(ioend) || > - (ioend->io_flags & (IOMAP_IOEND_UNWRITTEN | IOMAP_IOEND_SHARED))) > + if (xfs_ioend_is_append(ioend) || ioend->io_flags & IOEND_WQ_FLAGS) > ioend->io_bio.bi_end_io = xfs_end_bio; > > if (status) IMO, this would be cleaner as a helper so that individual cases can be commented correctly, as page cache invalidation does not actually require a transaction... Something like: static bool xfs_ioend_needs_wq_completion( struct xfs_ioend *ioend) { /* Changing inode size requires a transaction. */ if (xfs_ioend_is_append(ioend)) return true; /* Extent manipulation requires a transaction. */ if (ioend->io_flags & (IOMAP_IOEND_UNWRITTEN | IOMAP_IOEND_SHARED)) return true; /* Page cache invalidation cannot be done in irq context. */ if (ioend->io_flags & IOMAP_IOEND_DONTCACHE) return true; return false; } Otherwise seems fine. -Dave. -- Dave Chinner david@xxxxxxxxxxxxx