Re: [PATCH 06/13] fuse: implement buffered IO with iomap

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Jul 18, 2025 at 08:39:18PM +0200, Bernd Schubert wrote:
> 
> 
> On 7/18/25 20:01, Darrick J. Wong wrote:
> > On Fri, Jul 18, 2025 at 05:10:14PM +0200, Amir Goldstein wrote:
> >> On Fri, Jul 18, 2025 at 1:32 AM Darrick J. Wong <djwong@xxxxxxxxxx> wrote:
> >>>
> >>> From: Darrick J. Wong <djwong@xxxxxxxxxx>
> >>>
> >>> Implement pagecache IO with iomap, complete with hooks into truncate and
> >>> fallocate so that the fuse server needn't implement disk block zeroing
> >>> of post-EOF and unaligned punch/zero regions.
> >>>
> >>> Signed-off-by: "Darrick J. Wong" <djwong@xxxxxxxxxx>
> >>> ---
> >>>  fs/fuse/fuse_i.h          |   46 +++
> >>>  fs/fuse/fuse_trace.h      |  391 ++++++++++++++++++++++++
> >>>  include/uapi/linux/fuse.h |    5
> >>>  fs/fuse/dir.c             |   23 +
> >>>  fs/fuse/file.c            |   90 +++++-
> >>>  fs/fuse/file_iomap.c      |  723 +++++++++++++++++++++++++++++++++++++++++++++
> >>>  fs/fuse/inode.c           |   14 +
> >>>  7 files changed, 1268 insertions(+), 24 deletions(-)
> >>>
> >>>
> >>> diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
> >>> index 67e428da4391aa..f33b348d296d5e 100644
> >>> --- a/fs/fuse/fuse_i.h
> >>> +++ b/fs/fuse/fuse_i.h
> >>> @@ -161,6 +161,13 @@ struct fuse_inode {
> >>>
> >>>                         /* waitq for direct-io completion */
> >>>                         wait_queue_head_t direct_io_waitq;
> >>> +
> >>> +#ifdef CONFIG_FUSE_IOMAP
> >>> +                       /* pending io completions */
> >>> +                       spinlock_t ioend_lock;
> >>> +                       struct work_struct ioend_work;
> >>> +                       struct list_head ioend_list;
> >>> +#endif
> >>>                 };
> >>
> >> This union member you are changing is declared for
> >> /* read/write io cache (regular file only) */
> >> but actually it is also for parallel dio and passthrough mode
> >>
> >> IIUC, there should be zero intersection between these io modes and
> >>  /* iomap cached fileio (regular file only) */
> >>
> >> Right?
> > 
> > Right.  iomap will get very very confused if you switch file IO paths on
> > a live file.  I think it's /possible/ to switch if you flush and
> > truncate the whole page cache while holding inode_lock() but I don't
> > think anyone has ever tried.
> > 
> >> So it can use its own union member without increasing fuse_inode size.
> >>
> >> Just need to be carefull in fuse_init_file_inode(), fuse_evict_inode() and
> >> fuse_file_io_release() which do not assume a specific inode io mode.
> > 
> > Yes, I think it's possible to put the iomap stuff in a separate struct
> > within that union so that we're not increasing the fuse_inode size
> > unnecessarily.  That's desirable for something to do before merging,
> > but for now prototyping is /much/ easier if I don't have to do that.
> > 
> > Making that change will require a lot of careful auditing, first I want
> > to make sure you all agree with the iomap approach because it's much
> > different from what I see in the other fuse IO paths. :)
> > 
> > Eeeyiks, struct fuse_inode shrinks from 1272 bytes to 1152 if I push the
> > iomap stuff into its own union struct.
> > 
> >> Was it your intention to allow filesystems to configure some inodes to be
> >> in file_iomap mode and other inodes to be in regular cached/direct/passthrough
> >> io modes?
> > 
> > That was a deliberate design decision on my part -- maybe a fuse server
> > would be capable of serving up some files from a local disk, and others
> > from (say) a network filesystem.  Or maybe it would like to expose an
> > administrative fd for the filesystem (like the xfs_healer event stream)
> > that isn't backed by storage.
> > 
> >> I can't say that I see a big benefit in allowing such setups.
> >> It certainly adds a lot of complication to the test matrix if we allow that.
> >> My instinct is for initial version, either allow only opening files in
> >> FILE_IOMAP or
> >> DIRECT_IOMAP to inodes for a filesystem that supports those modes.
> > 
> > I was thinking about combining FUSE_ATTR_IOMAP_(DIRECTIO|FILEIO) for the
> > next RFC because I can't imagine any scenario where you don't want
> > directio support if you already use iomap for the pagecache.  fuse iomap
> > requires directio write support for writeback, so the server *must*
> > support IOMAP_WRITE|IOMAP_DIRECT.
> > 
> >> Perhaps later we can allow (and maybe fallback to) FOPEN_DIRECT_IO
> >> (without parallel dio) if a server does not configure IOMAP to some inode
> >> to allow a server to provide the data for a specific inode directly.
> > 
> > Hrmm.  Is FOPEN_DIRECT_IO the magic flag that fuse passes to the fuse
> > server to tell it that a file is open in directio mode?  There's a few
> > fstests that initiate aio+dio writes to a dm-error device that currently
> > fail in non-iomap mode because fuse2fs writes everything to the bdev
> > pagecache.
> 
> 
> The other way around, FOPEN_DIRECT_IO is a flag that fuse-server tells
> the kernel that it wants to bypass the page cache. And also allows
> parallel DIO IO (shared vs exclusive lock).

Oh ok.  iomap supports parallel directio writes, but one has to be
careful to drop to synchronous mode for file extending and unaligned
writes so I've left it out of the prototype for now.  (Parallel reads
are supported by default.)

Hrmm I'll have to study these more...

--D

> 
> Thanks,
> Bernd
> 




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux