Re: [PATCH 06/13] fuse: implement buffered IO with iomap

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 7/18/25 20:01, Darrick J. Wong wrote:
> On Fri, Jul 18, 2025 at 05:10:14PM +0200, Amir Goldstein wrote:
>> On Fri, Jul 18, 2025 at 1:32 AM Darrick J. Wong <djwong@xxxxxxxxxx> wrote:
>>>
>>> From: Darrick J. Wong <djwong@xxxxxxxxxx>
>>>
>>> Implement pagecache IO with iomap, complete with hooks into truncate and
>>> fallocate so that the fuse server needn't implement disk block zeroing
>>> of post-EOF and unaligned punch/zero regions.
>>>
>>> Signed-off-by: "Darrick J. Wong" <djwong@xxxxxxxxxx>
>>> ---
>>>  fs/fuse/fuse_i.h          |   46 +++
>>>  fs/fuse/fuse_trace.h      |  391 ++++++++++++++++++++++++
>>>  include/uapi/linux/fuse.h |    5
>>>  fs/fuse/dir.c             |   23 +
>>>  fs/fuse/file.c            |   90 +++++-
>>>  fs/fuse/file_iomap.c      |  723 +++++++++++++++++++++++++++++++++++++++++++++
>>>  fs/fuse/inode.c           |   14 +
>>>  7 files changed, 1268 insertions(+), 24 deletions(-)
>>>
>>>
>>> diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
>>> index 67e428da4391aa..f33b348d296d5e 100644
>>> --- a/fs/fuse/fuse_i.h
>>> +++ b/fs/fuse/fuse_i.h
>>> @@ -161,6 +161,13 @@ struct fuse_inode {
>>>
>>>                         /* waitq for direct-io completion */
>>>                         wait_queue_head_t direct_io_waitq;
>>> +
>>> +#ifdef CONFIG_FUSE_IOMAP
>>> +                       /* pending io completions */
>>> +                       spinlock_t ioend_lock;
>>> +                       struct work_struct ioend_work;
>>> +                       struct list_head ioend_list;
>>> +#endif
>>>                 };
>>
>> This union member you are changing is declared for
>> /* read/write io cache (regular file only) */
>> but actually it is also for parallel dio and passthrough mode
>>
>> IIUC, there should be zero intersection between these io modes and
>>  /* iomap cached fileio (regular file only) */
>>
>> Right?
> 
> Right.  iomap will get very very confused if you switch file IO paths on
> a live file.  I think it's /possible/ to switch if you flush and
> truncate the whole page cache while holding inode_lock() but I don't
> think anyone has ever tried.
> 
>> So it can use its own union member without increasing fuse_inode size.
>>
>> Just need to be carefull in fuse_init_file_inode(), fuse_evict_inode() and
>> fuse_file_io_release() which do not assume a specific inode io mode.
> 
> Yes, I think it's possible to put the iomap stuff in a separate struct
> within that union so that we're not increasing the fuse_inode size
> unnecessarily.  That's desirable for something to do before merging,
> but for now prototyping is /much/ easier if I don't have to do that.
> 
> Making that change will require a lot of careful auditing, first I want
> to make sure you all agree with the iomap approach because it's much
> different from what I see in the other fuse IO paths. :)
> 
> Eeeyiks, struct fuse_inode shrinks from 1272 bytes to 1152 if I push the
> iomap stuff into its own union struct.
> 
>> Was it your intention to allow filesystems to configure some inodes to be
>> in file_iomap mode and other inodes to be in regular cached/direct/passthrough
>> io modes?
> 
> That was a deliberate design decision on my part -- maybe a fuse server
> would be capable of serving up some files from a local disk, and others
> from (say) a network filesystem.  Or maybe it would like to expose an
> administrative fd for the filesystem (like the xfs_healer event stream)
> that isn't backed by storage.
> 
>> I can't say that I see a big benefit in allowing such setups.
>> It certainly adds a lot of complication to the test matrix if we allow that.
>> My instinct is for initial version, either allow only opening files in
>> FILE_IOMAP or
>> DIRECT_IOMAP to inodes for a filesystem that supports those modes.
> 
> I was thinking about combining FUSE_ATTR_IOMAP_(DIRECTIO|FILEIO) for the
> next RFC because I can't imagine any scenario where you don't want
> directio support if you already use iomap for the pagecache.  fuse iomap
> requires directio write support for writeback, so the server *must*
> support IOMAP_WRITE|IOMAP_DIRECT.
> 
>> Perhaps later we can allow (and maybe fallback to) FOPEN_DIRECT_IO
>> (without parallel dio) if a server does not configure IOMAP to some inode
>> to allow a server to provide the data for a specific inode directly.
> 
> Hrmm.  Is FOPEN_DIRECT_IO the magic flag that fuse passes to the fuse
> server to tell it that a file is open in directio mode?  There's a few
> fstests that initiate aio+dio writes to a dm-error device that currently
> fail in non-iomap mode because fuse2fs writes everything to the bdev
> pagecache.


The other way around, FOPEN_DIRECT_IO is a flag that fuse-server tells
the kernel that it wants to bypass the page cache. And also allows
parallel DIO IO (shared vs exclusive lock).


Thanks,
Bernd




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux