On Mon, Jun 9, 2025 at 9:24 AM Darrick J. Wong <djwong@xxxxxxxxxx> wrote: > > On Fri, Jun 06, 2025 at 04:37:57PM -0700, Joanne Koong wrote: > > Add a new iomap type, IOMAP_IN_MEM, that represents data that resides in > > memory and does not map to or depend on the block layer and is not > > embedded inline in an inode. This will be used for example by filesystems > > such as FUSE where the data is in memory or needs to be fetched from a > > server and is not coupled with the block layer. This lets these > > filesystems use some of the internal features in iomaps such as > > granular dirty tracking for large folios. > > How does this differ from using IOMAP_INLINE and setting > iomap::inline_data = kmap_local_folio(...)? Is the situation here that > FUSE already /has/ a folio from the mapping, so all you really need > iomap to do is manage the folio's uptodate/dirty state? > I had looked into whether IOMAP_INLINE could be used but there are a few issues: a) no granular uptodate reading of the folio if the folio needs to be read into the page cache If fuse uses IOMAP_INLINE then it'll need to read in all the bytes of whatever needs to be written into the folio because the IOMAP_INLINE points to one contiguous memory region, not different chunks. For example if there's a 2 MB file and position 0 to 1 MB of the file is represented by a 1 MB folio, and a client issues a write from position 1 to 1048575, we'll need to read in the entire folio instead of just the first and last chunks. b) an extra memcpy is incurred if the folio needs to be read in (extra read comes from reading inline data into folio) and an extra memcpy is incurred after the write (extra write comes from writing from folio -> inline data) IOMAP_INLINE copies the inline data into the folio (iomap_write_begin_inline() -> iomap_read_inline_data() -> folio_fill_tail()) but for fuse, the folio would already have had to been fetched from the server in fuse's ->iomap_begin callback (and similarly, the folio tail zeroing and dcache flush will be unnecessary work here too). When the write is finished, there's an extra memcpy incurred from iomap_write_end_inline() copying data from the folio back to inline data (for fuse, inline data is already the folio). I guess we could add some flag that the filesystem can set in ->iomap_begin() to indicate that it's an IOMAP_INLINE type where the mem is the folio being written, but that still doesn't help with the issue in a). c) IOMAP_INLINE isn't supported for writepages. From what I see, this was added in commit 3e19e6f3e (" iomap: warn on inline maps in iomap_writepage_map"). Maybe it's as simple as now allowing inline maps to be used in writepages but it also seems to suggest that inline maps is meant for something different than what fuse is trying to do with it. > --D > > > Signed-off-by: Joanne Koong <joannelkoong@xxxxxxxxx> > > --- > > include/linux/iomap.h | 1 + > > 1 file changed, 1 insertion(+) > > > > diff --git a/include/linux/iomap.h b/include/linux/iomap.h > > index 68416b135151..dbbf217eb03f 100644 > > --- a/include/linux/iomap.h > > +++ b/include/linux/iomap.h > > @@ -30,6 +30,7 @@ struct vm_fault; > > #define IOMAP_MAPPED 2 /* blocks allocated at @addr */ > > #define IOMAP_UNWRITTEN 3 /* blocks allocated at @addr in unwritten state */ > > #define IOMAP_INLINE 4 /* data inline in the inode */ > > +#define IOMAP_IN_MEM 5 /* data in memory, does not map to blocks */ > > > > /* > > * Flags reported by the file system from iomap_begin: > > -- > > 2.47.1 > > > >