On 5/27/25 3:12 PM, Dave Chinner wrote: > On Tue, May 27, 2025 at 09:43:42AM -0600, Jens Axboe wrote: >> DONTCACHE I/O must have the completion punted to a workqueue, just like >> what is done for unwritten extents, as the completion needs task context >> to perform the invalidation of the folio(s). However, if writeback is >> started off filemap_fdatawrite_range() off generic_sync() and it's an >> overwrite, then the DONTCACHE marking gets lost as iomap_add_to_ioend() >> don't look at the folio being added and no further state is passed down >> to help it know that this is a dropbehind/DONTCACHE write. >> >> Check if the folio being added is marked as dropbehind, and set >> IOMAP_IOEND_DONTCACHE if that is the case. Then XFS can factor this into >> the decision making of completion context in xfs_submit_ioend(). >> Additionally include this ioend flag in the NOMERGE flags, to avoid >> mixing it with unrelated IO. >> >> This fixes extra page cache being instantiated when the write performed >> is an overwrite, rather than newly instantiated blocks. >> >> Fixes: b2cd5ae693a3 ("iomap: make buffered writes work with RWF_DONTCACHE") >> Signed-off-by: Jens Axboe <axboe@xxxxxxxxx> >> >> --- >> >> Found this one while testing the unrelated issue of invalidation being a >> bit broken before 6.15 release. We need this to ensure that overwrites >> also prune correctly, just like unwritten extents currently do. > > I wondered about the stack traces showing DONTCACHE writeback > completion being handled from irq context[*] when I read the -fsdevel > thread about broken DONTCACHE functionality yesterday. > > [*] second trace in the failure reported in this comment: > > https://lore.kernel.org/linux-fsdevel/432302ad-aa95-44f4-8728-77e61cc1f20c@xxxxxxxxx/ Indeed, though that could've been a "normal" write and not a DONTCACHE one. But with the bug being fixed by this one, both would've gone that path... >> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c >> index 233abf598f65..3729391a18f3 100644 >> --- a/fs/iomap/buffered-io.c >> +++ b/fs/iomap/buffered-io.c >> @@ -1691,6 +1691,8 @@ static int iomap_add_to_ioend(struct iomap_writepage_ctx *wpc, >> ioend_flags |= IOMAP_IOEND_UNWRITTEN; >> if (wpc->iomap.flags & IOMAP_F_SHARED) >> ioend_flags |= IOMAP_IOEND_SHARED; >> + if (folio_test_dropbehind(folio)) >> + ioend_flags |= IOMAP_IOEND_DONTCACHE; >> if (pos == wpc->iomap.offset && (wpc->iomap.flags & IOMAP_F_BOUNDARY)) >> ioend_flags |= IOMAP_IOEND_BOUNDARY; >> >> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c >> index 26a04a783489..1b7a006402ea 100644 >> --- a/fs/xfs/xfs_aops.c >> +++ b/fs/xfs/xfs_aops.c >> @@ -436,6 +436,9 @@ xfs_map_blocks( >> return 0; >> } >> >> +#define IOEND_WQ_FLAGS (IOMAP_IOEND_UNWRITTEN | IOMAP_IOEND_SHARED | \ >> + IOMAP_IOEND_DONTCACHE) >> + >> static int >> xfs_submit_ioend( >> struct iomap_writepage_ctx *wpc, >> @@ -460,8 +463,7 @@ xfs_submit_ioend( >> memalloc_nofs_restore(nofs_flag); >> >> /* send ioends that might require a transaction to the completion wq */ >> - if (xfs_ioend_is_append(ioend) || >> - (ioend->io_flags & (IOMAP_IOEND_UNWRITTEN | IOMAP_IOEND_SHARED))) >> + if (xfs_ioend_is_append(ioend) || ioend->io_flags & IOEND_WQ_FLAGS) >> ioend->io_bio.bi_end_io = xfs_end_bio; >> >> if (status) > > IMO, this would be cleaner as a helper so that individual cases can > be commented correctly, as page cache invalidation does not actually > require a transaction... > > Something like: > > static bool > xfs_ioend_needs_wq_completion( > struct xfs_ioend *ioend) > { > /* Changing inode size requires a transaction. */ > if (xfs_ioend_is_append(ioend)) > return true; > > /* Extent manipulation requires a transaction. */ > if (ioend->io_flags & (IOMAP_IOEND_UNWRITTEN | IOMAP_IOEND_SHARED)) > return true; > > /* Page cache invalidation cannot be done in irq context. */ > if (ioend->io_flags & IOMAP_IOEND_DONTCACHE) > return true; > > return false; > } > > Otherwise seems fine. Yeah I like that, gets rid of the need to add the mask as well. I'll spin a v2 and add the helper. -- Jens Axboe