Re: [PATCH 06/13] fuse: implement buffered IO with iomap

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Jul 18, 2025 at 8:01 PM Darrick J. Wong <djwong@xxxxxxxxxx> wrote:
>
> On Fri, Jul 18, 2025 at 05:10:14PM +0200, Amir Goldstein wrote:
> > On Fri, Jul 18, 2025 at 1:32 AM Darrick J. Wong <djwong@xxxxxxxxxx> wrote:
> > >
> > > From: Darrick J. Wong <djwong@xxxxxxxxxx>
> > >
> > > Implement pagecache IO with iomap, complete with hooks into truncate and
> > > fallocate so that the fuse server needn't implement disk block zeroing
> > > of post-EOF and unaligned punch/zero regions.
> > >
> > > Signed-off-by: "Darrick J. Wong" <djwong@xxxxxxxxxx>
> > > ---
> > >  fs/fuse/fuse_i.h          |   46 +++
> > >  fs/fuse/fuse_trace.h      |  391 ++++++++++++++++++++++++
> > >  include/uapi/linux/fuse.h |    5
> > >  fs/fuse/dir.c             |   23 +
> > >  fs/fuse/file.c            |   90 +++++-
> > >  fs/fuse/file_iomap.c      |  723 +++++++++++++++++++++++++++++++++++++++++++++
> > >  fs/fuse/inode.c           |   14 +
> > >  7 files changed, 1268 insertions(+), 24 deletions(-)
> > >
> > >
> > > diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
> > > index 67e428da4391aa..f33b348d296d5e 100644
> > > --- a/fs/fuse/fuse_i.h
> > > +++ b/fs/fuse/fuse_i.h
> > > @@ -161,6 +161,13 @@ struct fuse_inode {
> > >
> > >                         /* waitq for direct-io completion */
> > >                         wait_queue_head_t direct_io_waitq;
> > > +
> > > +#ifdef CONFIG_FUSE_IOMAP
> > > +                       /* pending io completions */
> > > +                       spinlock_t ioend_lock;
> > > +                       struct work_struct ioend_work;
> > > +                       struct list_head ioend_list;
> > > +#endif
> > >                 };
> >
> > This union member you are changing is declared for
> > /* read/write io cache (regular file only) */
> > but actually it is also for parallel dio and passthrough mode
> >
> > IIUC, there should be zero intersection between these io modes and
> >  /* iomap cached fileio (regular file only) */
> >
> > Right?
>
> Right.  iomap will get very very confused if you switch file IO paths on
> a live file.  I think it's /possible/ to switch if you flush and
> truncate the whole page cache while holding inode_lock() but I don't
> think anyone has ever tried.
>
> > So it can use its own union member without increasing fuse_inode size.
> >
> > Just need to be carefull in fuse_init_file_inode(), fuse_evict_inode() and
> > fuse_file_io_release() which do not assume a specific inode io mode.
>
> Yes, I think it's possible to put the iomap stuff in a separate struct
> within that union so that we're not increasing the fuse_inode size
> unnecessarily.  That's desirable for something to do before merging,
> but for now prototyping is /much/ easier if I don't have to do that.
>

understood. you can deal with that later. I just wanted to leave a TODO note.

> Making that change will require a lot of careful auditing, first I want
> to make sure you all agree with the iomap approach because it's much
> different from what I see in the other fuse IO paths. :)
>

Indeed a good audit will be required, but
*if* you can guarantee to configure iomap alway at inode initiation
then in fuse_init_file_inode() it is clear, which member of the union
is being initialized and this mode has to stick with the inode until
evict anyway.

So basically, all you need to do is never allow configuring iomap on an
already initialized inode.

> Eeeyiks, struct fuse_inode shrinks from 1272 bytes to 1152 if I push the
> iomap stuff into its own union struct.
>
> > Was it your intention to allow filesystems to configure some inodes to be
> > in file_iomap mode and other inodes to be in regular cached/direct/passthrough
> > io modes?
>
> That was a deliberate design decision on my part -- maybe a fuse server
> would be capable of serving up some files from a local disk, and others
> from (say) a network filesystem.  Or maybe it would like to expose an
> administrative fd for the filesystem (like the xfs_healer event stream)
> that isn't backed by storage.
>

Understood.

But the filesystem should be able to make the decision on inode
initiation time (lookup)
and once made, this decision sticks throughout the inode lifetime. Right?

> > I can't say that I see a big benefit in allowing such setups.
> > It certainly adds a lot of complication to the test matrix if we allow that.
> > My instinct is for initial version, either allow only opening files in
> > FILE_IOMAP or
> > DIRECT_IOMAP to inodes for a filesystem that supports those modes.
>
> I was thinking about combining FUSE_ATTR_IOMAP_(DIRECTIO|FILEIO) for the
> next RFC because I can't imagine any scenario where you don't want
> directio support if you already use iomap for the pagecache.  fuse iomap
> requires directio write support for writeback, so the server *must*
> support IOMAP_WRITE|IOMAP_DIRECT.
>
> > Perhaps later we can allow (and maybe fallback to) FOPEN_DIRECT_IO
> > (without parallel dio) if a server does not configure IOMAP to some inode
> > to allow a server to provide the data for a specific inode directly.
>
> Hrmm.  Is FOPEN_DIRECT_IO the magic flag that fuse passes to the fuse
> server to tell it that a file is open in directio mode?  There's a few
> fstests that initiate aio+dio writes to a dm-error device that currently
> fail in non-iomap mode because fuse2fs writes everything to the bdev
> pagecache.
>

Not exactly, but nevermind, you can use a much simpler logic for what
you described:
iomap has to be configured on inode instantiation and never changed afterwards.
Other inodes are not going to be affected by iomap at all from that point on.

> > fuse_file_io_open/release() can help you manage those restrictions and
> > set ff->iomode = IOM_FILE_IOMAP when a file is opened for file iomap.
> > I did not look closely enough to see if file_iomap code ends up setting
> > ff->iomode = IOM_CACHED/UNCACHED or always remains IOM_NONE.
>
> I don't touch ff->iomode because iomap is a per-inode property, not a
> per-file property... but I suppose that would be a good place to look.
>

Right, with cached/direct/passthrough the inode may change the iomode
after all files are closed, but we *do* keep the mode in the inode,
so we know that files cannot be opened in conflicting modes on the same inode.

The purpose of ff->iomode is to know if the file contributes to cached mode
positive iocachectr or to a negative passthrough mode refcount.

So setting ff->iomode = IOM_IOMAP just helps for annotating how the
file was opened, in case we are tracing it. There is no functional need to
define and set this mode on the file when the mode of the inode is const.

Thanks,
Amir.





[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux