On Fri, Jul 18, 2025 at 05:10:14PM +0200, Amir Goldstein wrote: > On Fri, Jul 18, 2025 at 1:32 AM Darrick J. Wong <djwong@xxxxxxxxxx> wrote: > > > > From: Darrick J. Wong <djwong@xxxxxxxxxx> > > > > Implement pagecache IO with iomap, complete with hooks into truncate and > > fallocate so that the fuse server needn't implement disk block zeroing > > of post-EOF and unaligned punch/zero regions. > > > > Signed-off-by: "Darrick J. Wong" <djwong@xxxxxxxxxx> > > --- > > fs/fuse/fuse_i.h | 46 +++ > > fs/fuse/fuse_trace.h | 391 ++++++++++++++++++++++++ > > include/uapi/linux/fuse.h | 5 > > fs/fuse/dir.c | 23 + > > fs/fuse/file.c | 90 +++++- > > fs/fuse/file_iomap.c | 723 +++++++++++++++++++++++++++++++++++++++++++++ > > fs/fuse/inode.c | 14 + > > 7 files changed, 1268 insertions(+), 24 deletions(-) > > > > > > diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h > > index 67e428da4391aa..f33b348d296d5e 100644 > > --- a/fs/fuse/fuse_i.h > > +++ b/fs/fuse/fuse_i.h > > @@ -161,6 +161,13 @@ struct fuse_inode { > > > > /* waitq for direct-io completion */ > > wait_queue_head_t direct_io_waitq; > > + > > +#ifdef CONFIG_FUSE_IOMAP > > + /* pending io completions */ > > + spinlock_t ioend_lock; > > + struct work_struct ioend_work; > > + struct list_head ioend_list; > > +#endif > > }; > > This union member you are changing is declared for > /* read/write io cache (regular file only) */ > but actually it is also for parallel dio and passthrough mode > > IIUC, there should be zero intersection between these io modes and > /* iomap cached fileio (regular file only) */ > > Right? Right. iomap will get very very confused if you switch file IO paths on a live file. I think it's /possible/ to switch if you flush and truncate the whole page cache while holding inode_lock() but I don't think anyone has ever tried. > So it can use its own union member without increasing fuse_inode size. > > Just need to be carefull in fuse_init_file_inode(), fuse_evict_inode() and > fuse_file_io_release() which do not assume a specific inode io mode. Yes, I think it's possible to put the iomap stuff in a separate struct within that union so that we're not increasing the fuse_inode size unnecessarily. That's desirable for something to do before merging, but for now prototyping is /much/ easier if I don't have to do that. Making that change will require a lot of careful auditing, first I want to make sure you all agree with the iomap approach because it's much different from what I see in the other fuse IO paths. :) Eeeyiks, struct fuse_inode shrinks from 1272 bytes to 1152 if I push the iomap stuff into its own union struct. > Was it your intention to allow filesystems to configure some inodes to be > in file_iomap mode and other inodes to be in regular cached/direct/passthrough > io modes? That was a deliberate design decision on my part -- maybe a fuse server would be capable of serving up some files from a local disk, and others from (say) a network filesystem. Or maybe it would like to expose an administrative fd for the filesystem (like the xfs_healer event stream) that isn't backed by storage. > I can't say that I see a big benefit in allowing such setups. > It certainly adds a lot of complication to the test matrix if we allow that. > My instinct is for initial version, either allow only opening files in > FILE_IOMAP or > DIRECT_IOMAP to inodes for a filesystem that supports those modes. I was thinking about combining FUSE_ATTR_IOMAP_(DIRECTIO|FILEIO) for the next RFC because I can't imagine any scenario where you don't want directio support if you already use iomap for the pagecache. fuse iomap requires directio write support for writeback, so the server *must* support IOMAP_WRITE|IOMAP_DIRECT. > Perhaps later we can allow (and maybe fallback to) FOPEN_DIRECT_IO > (without parallel dio) if a server does not configure IOMAP to some inode > to allow a server to provide the data for a specific inode directly. Hrmm. Is FOPEN_DIRECT_IO the magic flag that fuse passes to the fuse server to tell it that a file is open in directio mode? There's a few fstests that initiate aio+dio writes to a dm-error device that currently fail in non-iomap mode because fuse2fs writes everything to the bdev pagecache. > fuse_file_io_open/release() can help you manage those restrictions and > set ff->iomode = IOM_FILE_IOMAP when a file is opened for file iomap. > I did not look closely enough to see if file_iomap code ends up setting > ff->iomode = IOM_CACHED/UNCACHED or always remains IOM_NONE. I don't touch ff->iomode because iomap is a per-inode property, not a per-file property... but I suppose that would be a good place to look. Thank you for the feedback! --D