On Mon, Jun 09, 2025 at 02:28:52PM -0700, Joanne Koong wrote: > On Mon, Jun 9, 2025 at 9:24 AM Darrick J. Wong <djwong@xxxxxxxxxx> wrote: > > > > On Fri, Jun 06, 2025 at 04:37:57PM -0700, Joanne Koong wrote: > > > Add a new iomap type, IOMAP_IN_MEM, that represents data that resides in > > > memory and does not map to or depend on the block layer and is not > > > embedded inline in an inode. This will be used for example by filesystems > > > such as FUSE where the data is in memory or needs to be fetched from a > > > server and is not coupled with the block layer. This lets these > > > filesystems use some of the internal features in iomaps such as > > > granular dirty tracking for large folios. > > > > How does this differ from using IOMAP_INLINE and setting > > iomap::inline_data = kmap_local_folio(...)? Is the situation here that > > FUSE already /has/ a folio from the mapping, so all you really need > > iomap to do is manage the folio's uptodate/dirty state? > > > > I had looked into whether IOMAP_INLINE could be used but there are a few issues: > > a) no granular uptodate reading of the folio if the folio needs to be > read into the page cache > If fuse uses IOMAP_INLINE then it'll need to read in all the bytes of > whatever needs to be written into the folio because the IOMAP_INLINE > points to one contiguous memory region, not different chunks. For > example if there's a 2 MB file and position 0 to 1 MB of the file is > represented by a 1 MB folio, and a client issues a write from position > 1 to 1048575, we'll need to read in the entire folio instead of just > the first and last chunks. Well we could modify the IOMAP_INLINE code to handle iomap::offset > 0 so you could keep feeding the pagecache inline mappings as packets of data become available. But that statement is missing the point, since I think you already /have/ the folios populated and stuffed in i_mapping; you just need iomap for the sub-folio state tracking when things get dirty. > b) an extra memcpy is incurred if the folio needs to be read in (extra > read comes from reading inline data into folio) and an extra memcpy is > incurred after the write (extra write comes from writing from folio -> > inline data) > IOMAP_INLINE copies the inline data into the folio > (iomap_write_begin_inline() -> iomap_read_inline_data() -> > folio_fill_tail()) but for fuse, the folio would already have had to > been fetched from the server in fuse's ->iomap_begin callback (and > similarly, the folio tail zeroing and dcache flush will be > unnecessary work here too). When the write is finished, there's an > extra memcpy incurred from iomap_write_end_inline() copying data from > the folio back to inline data (for fuse, inline data is already the > folio). > > I guess we could add some flag that the filesystem can set in > ->iomap_begin() to indicate that it's an IOMAP_INLINE type where the > mem is the folio being written, but that still doesn't help with the > issue in a). I think we already did something like that for fsdax. > c) IOMAP_INLINE isn't supported for writepages. From what I see, this > was added in commit 3e19e6f3e (" iomap: warn on inline maps in > iomap_writepage_map"). Maybe it's as simple as now allowing inline > maps to be used in writepages but it also seems to suggest that inline > maps is meant for something different than what fuse is trying to do > with it. Yeah -- the sole user (gfs2) stores the inline data near the inode, so ->iomap_begin initiates a transaction and locks the inode and returns. iomap copies data between the pagecache and iomap::addr, and calls ->iomap_end, which commits the transaction, unlocks the inode, and cleans the page. That's why writeback doesn't support IOMAP_INLINE; there's no users for it. If ext4 ever gets to handling inline data via iomap, I think they'd do a similar dance. > > --D > > > > > Signed-off-by: Joanne Koong <joannelkoong@xxxxxxxxx> > > > --- > > > include/linux/iomap.h | 1 + > > > 1 file changed, 1 insertion(+) > > > > > > diff --git a/include/linux/iomap.h b/include/linux/iomap.h > > > index 68416b135151..dbbf217eb03f 100644 > > > --- a/include/linux/iomap.h > > > +++ b/include/linux/iomap.h > > > @@ -30,6 +30,7 @@ struct vm_fault; > > > #define IOMAP_MAPPED 2 /* blocks allocated at @addr */ > > > #define IOMAP_UNWRITTEN 3 /* blocks allocated at @addr in unwritten state */ > > > #define IOMAP_INLINE 4 /* data inline in the inode */ > > > +#define IOMAP_IN_MEM 5 /* data in memory, does not map to blocks */ > > > > > > /* > > > * Flags reported by the file system from iomap_begin: > > > -- > > > 2.47.1 > > > > > > >