On Tue, Jun 10, 2025 at 9:04 PM Christoph Hellwig <hch@xxxxxxxxxxxxx> wrote: > > On Tue, Jun 10, 2025 at 01:13:09PM -0700, Joanne Koong wrote: > > > synchronous ones. And if the file system fragmented the folio so badly > > > that we'll now need to do more than two reads we're still at least > > > pipelining it, although that should basically never happen with modern > > > file systems. > > > > If the filesystem wants granular folio reads, it can also just do that > > itself by calling an iomap helper (eg what iomap_adjust_read_range() > > is doing right now) in its ->read_folio() implementation, correct? > > Well, nothing tells ->read_folio how much to read. But having a new Not a great idea, but theoretically we could stash that info (offset and len) in the folio->private iomap_folio_state struct. I don't think that runs into synchronization issues since it would be set and cleared while the file lock is held for that read. But regardless I think we still need a new variant of read_folio because if a non block-io iomap wants to use iomap_read_folio() / iomap_readahead() for the granular uptodate parsing logic that's in there, it'll need to provide a method for reading a partial folio. I initially wasn't planning to have fuse use iomap_read_folio() / iomap_readahead() but I realized there's some cases where fuse will find it useful, so i'm planning to add that in. > variant of read_folio that allows partial reads might still be nicer > than a iomap_folio_op. Let me draft that and see if willy or other mm > folks choke on it :) writeback_folio() is also a VM level concept so under that same logic, should writeback_folio() also be an address space operation? A more general question i've been trying to figure out is if the vision is that iomap is going to be the defacto generic library that all/most filesystems will be using in the future? If so then it makes sense to me to add this to the address space operations but if not then I don't think I see the hate for having the folio callbacks be embedded in iomap_folio_op. > > > For fuse at least, we definitely want granular reads, since reads may > > be extremely expensive (eg it may be a network fetch) and there's > > non-trivial mempcy overhead incurred with fuse needing to memcpy read > > buffer data from userspace back to the kernel. > > Ok, with that the plain ->read_folio variant is not going to fly. > > > > + folio_lock(folio); > > > + if (unlikely(folio->mapping != inode->i_mapping)) > > > + return 1; > > > + if (unlikely(!iomap_validate(iter))) > > > + return 1; > > > > Does this now basically mean that every caller that uses iomap for > > writes will have to implement ->iomap_valid and up the sequence > > counter anytime there's a write or truncate, in case the folio changes > > during the lock drop? Or were we already supposed to be doing this? > > Not any more than before. It's is still option, but you still > very much want it to protect against races updating the mapping. > Okay thanks, I think I'll need to add this in for fuse then. I'll look at this some more