On Fri, Jun 20, 2025 at 11:15 AM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote: > > On Wed, Jun 18, 2025 at 08:17:03AM -0400, Jeff Layton wrote: > > On Wed, 2025-06-11 at 05:34 +0100, Matthew Wilcox wrote: > > > On Mon, Jun 09, 2025 at 08:59:53PM -0700, Christoph Hellwig wrote: > > > > On Mon, Jun 09, 2025 at 10:14:44AM -0700, Darrick J. Wong wrote: > > > > > > Where "folio laundering" means calling ->launder_folio, right? > > > > > > > > > > What does fuse use folio laundering for, anyway? It looks to me like > > > > > the primary users are invalidate_inode_pages*. Either the caller cares > > > > > about flushing dirty data and has called filemap_write_and_wait_range; > > > > > or it doesn't and wants to tear down the pagecache ahead of some other > > > > > operation that's going to change the file contents and doesn't care. > > > > > > > > > > I suppose it could be useful as a last-chance operation on a dirty folio > > > > > that was dirtied after a filemap_write_and_wait_range but before > > > > > invalidate_inode_pages*? Though for xfs we just return EBUSY and let > > > > > the caller try again (or not). Is there a subtlety to fuse here that I > > > > > don't know about? > > > > > > > > My memory might be betraying me, but I think willy once launched an > > > > attempt to see if we can kill launder_folio. Adding him, and the > > > > mm and nfs lists to check if I have a point :) > > > > > > I ... got distracted with everything else. > > > > > > Looking at the original addition of ->launder_page (e3db7691e9f3), I > > > don't understand why we need it. invalidate_inode_pages2() isn't > > > supposed to invalidate dirty pages, so I don't understand why nfs > > > found it necessary to do writeback from ->releasepage() instead > > > of just returning false like iomap does. > > > > > > There's now a new question of what the hell btrfs is up to with > > > ->launder_folio, which they just added recently. > > > > IIRC... > > > > The problem was a race where a task could could dirty a page in a > > mmap'ed file after it had been written back but before it was unmapped > > from the pagecache. > > > > Bear in mind that the NFS client may need write back and then > > invalidate the pagecache for a file that is still in use if it > > discovers that the inode's attributes have changed on the server. > > > > Trond's solution was to write the page out while holding the page lock > > in this situation. I think we'd all welcome a way to avoid this race > > that didn't require launder_folio(). > > I think the problem is that we've left it up to the filesystem to handle > "what do we do if we've dirtied a page^W folio between, say, calling > filemap_write_and_wait_range() and calling filemap_release_folio()". > Just to make sure we're all on the same page here, this is the sample > path I'm looking at: > > __iomap_dio_rw > kiocb_invalidate_pages > filemap_invalidate_pages > filemap_write_and_wait_range > invalidate_inode_pages2_range > unmap_mapping_pages > folio_lock > folio_wait_writeback > folio_unmap_invalidate > unmap_mapping_folio > folio_launder > filemap_release_folio > if (folio_test_dirty(folio)) > return -EBUSY; > > So some filesystems opt to write back the folio which has been dirtied > (by implementing ->launder_folio) and others opt to fail (and fall back to > buffered I/O when the user has requested direct I/O). iomap filesystems > all just "return false" for dirty folios, so it's clearly an acceptable > outcome as far as xfstests go. > > The question is whether this is acceptable for all the filesystem > which implement ->launder_folio today. Because we could just move the > folio_test_dirty() to after the folio_lock() and remove all the testing > of folio dirtiness from individual filesystems. Or could the filesystems that implement ->launder_folio (from what I see, there's only 4: fuse, nfs, btrfs, and orangefs) just move that logic into their .release_folio implementation? I don't see why not. In folio_unmap_invalidate(), we call: ret = folio_launder(mapping, folio); if (ret) return ret; if (folio->mapping != mapping) return -EBUSY; if (!filemap_release_folio(folio, gfp)) return -EBUSY; If fuse, nfs, btrfs, and orangefs absolutely need to do whatever logic they're doing in .launder_folio, could they not just move it into .release_folio? > > Or have I missed something and picked the wrong sample path for > analysing why we do/don't need to writeback folios in > invalidate_inode_pages2_range()?