On 7/18/25 20:01, Darrick J. Wong wrote: > On Fri, Jul 18, 2025 at 05:10:14PM +0200, Amir Goldstein wrote: >> On Fri, Jul 18, 2025 at 1:32 AM Darrick J. Wong <djwong@xxxxxxxxxx> wrote: >>> >>> From: Darrick J. Wong <djwong@xxxxxxxxxx> >>> >>> Implement pagecache IO with iomap, complete with hooks into truncate and >>> fallocate so that the fuse server needn't implement disk block zeroing >>> of post-EOF and unaligned punch/zero regions. >>> >>> Signed-off-by: "Darrick J. Wong" <djwong@xxxxxxxxxx> >>> --- >>> fs/fuse/fuse_i.h | 46 +++ >>> fs/fuse/fuse_trace.h | 391 ++++++++++++++++++++++++ >>> include/uapi/linux/fuse.h | 5 >>> fs/fuse/dir.c | 23 + >>> fs/fuse/file.c | 90 +++++- >>> fs/fuse/file_iomap.c | 723 +++++++++++++++++++++++++++++++++++++++++++++ >>> fs/fuse/inode.c | 14 + >>> 7 files changed, 1268 insertions(+), 24 deletions(-) >>> >>> >>> diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h >>> index 67e428da4391aa..f33b348d296d5e 100644 >>> --- a/fs/fuse/fuse_i.h >>> +++ b/fs/fuse/fuse_i.h >>> @@ -161,6 +161,13 @@ struct fuse_inode { >>> >>> /* waitq for direct-io completion */ >>> wait_queue_head_t direct_io_waitq; >>> + >>> +#ifdef CONFIG_FUSE_IOMAP >>> + /* pending io completions */ >>> + spinlock_t ioend_lock; >>> + struct work_struct ioend_work; >>> + struct list_head ioend_list; >>> +#endif >>> }; >> >> This union member you are changing is declared for >> /* read/write io cache (regular file only) */ >> but actually it is also for parallel dio and passthrough mode >> >> IIUC, there should be zero intersection between these io modes and >> /* iomap cached fileio (regular file only) */ >> >> Right? > > Right. iomap will get very very confused if you switch file IO paths on > a live file. I think it's /possible/ to switch if you flush and > truncate the whole page cache while holding inode_lock() but I don't > think anyone has ever tried. > >> So it can use its own union member without increasing fuse_inode size. >> >> Just need to be carefull in fuse_init_file_inode(), fuse_evict_inode() and >> fuse_file_io_release() which do not assume a specific inode io mode. > > Yes, I think it's possible to put the iomap stuff in a separate struct > within that union so that we're not increasing the fuse_inode size > unnecessarily. That's desirable for something to do before merging, > but for now prototyping is /much/ easier if I don't have to do that. > > Making that change will require a lot of careful auditing, first I want > to make sure you all agree with the iomap approach because it's much > different from what I see in the other fuse IO paths. :) > > Eeeyiks, struct fuse_inode shrinks from 1272 bytes to 1152 if I push the > iomap stuff into its own union struct. > >> Was it your intention to allow filesystems to configure some inodes to be >> in file_iomap mode and other inodes to be in regular cached/direct/passthrough >> io modes? > > That was a deliberate design decision on my part -- maybe a fuse server > would be capable of serving up some files from a local disk, and others > from (say) a network filesystem. Or maybe it would like to expose an > administrative fd for the filesystem (like the xfs_healer event stream) > that isn't backed by storage. > >> I can't say that I see a big benefit in allowing such setups. >> It certainly adds a lot of complication to the test matrix if we allow that. >> My instinct is for initial version, either allow only opening files in >> FILE_IOMAP or >> DIRECT_IOMAP to inodes for a filesystem that supports those modes. > > I was thinking about combining FUSE_ATTR_IOMAP_(DIRECTIO|FILEIO) for the > next RFC because I can't imagine any scenario where you don't want > directio support if you already use iomap for the pagecache. fuse iomap > requires directio write support for writeback, so the server *must* > support IOMAP_WRITE|IOMAP_DIRECT. > >> Perhaps later we can allow (and maybe fallback to) FOPEN_DIRECT_IO >> (without parallel dio) if a server does not configure IOMAP to some inode >> to allow a server to provide the data for a specific inode directly. > > Hrmm. Is FOPEN_DIRECT_IO the magic flag that fuse passes to the fuse > server to tell it that a file is open in directio mode? There's a few > fstests that initiate aio+dio writes to a dm-error device that currently > fail in non-iomap mode because fuse2fs writes everything to the bdev > pagecache. The other way around, FOPEN_DIRECT_IO is a flag that fuse-server tells the kernel that it wants to bypass the page cache. And also allows parallel DIO IO (shared vs exclusive lock). Thanks, Bernd