Re: [PATCH v7 3/3] fuse: remove tmp folio for writebacks and internal rb tree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



  On Tue, Apr 8, 2025 at 7:43 PM Jingbo Xu <jefflexu@xxxxxxxxxxxxxxxxx> wrote:
>
> Hi Joanne,
>
> On 4/5/25 2:14 AM, Joanne Koong wrote:
> > In the current FUSE writeback design (see commit 3be5a52b30aa
> > ("fuse: support writable mmap")), a temp page is allocated for every
> > dirty page to be written back, the contents of the dirty page are copied over
> > to the temp page, and the temp page gets handed to the server to write back.
> >
> > This is done so that writeback may be immediately cleared on the dirty page,
> > and this in turn is done in order to mitigate the following deadlock scenario
> > that may arise if reclaim waits on writeback on the dirty page to complete:
> > * single-threaded FUSE server is in the middle of handling a request
> >   that needs a memory allocation
> > * memory allocation triggers direct reclaim
> > * direct reclaim waits on a folio under writeback
> > * the FUSE server can't write back the folio since it's stuck in
> >   direct reclaim
> >
> > With a recent change that added AS_WRITEBACK_INDETERMINATE and mitigates
> > the situations described above, FUSE writeback does not need to use
> > temp pages if it sets AS_WRITEBACK_INDETERMINATE on its inode mappings.
> >
> > This commit sets AS_WRITEBACK_INDETERMINATE on the inode mappings
> > and removes the temporary pages + extra copying and the internal rb
> > tree.
> >
> > fio benchmarks --
> > (using averages observed from 10 runs, throwing away outliers)
> >
> > Setup:
> > sudo mount -t tmpfs -o size=30G tmpfs ~/tmp_mount
> >  ./libfuse/build/example/passthrough_ll -o writeback -o max_threads=4 -o source=~/tmp_mount ~/fuse_mount
> >
> > fio --name=writeback --ioengine=sync --rw=write --bs={1k,4k,1M} --size=2G
> > --numjobs=2 --ramp_time=30 --group_reporting=1 --directory=/root/fuse_mount
> >
> >         bs =  1k          4k            1M
> > Before  351 MiB/s     1818 MiB/s     1851 MiB/s
> > After   341 MiB/s     2246 MiB/s     2685 MiB/s
> > % diff        -3%          23%         45%
> >
> > Signed-off-by: Joanne Koong <joannelkoong@xxxxxxxxx>
> > Reviewed-by: Jingbo Xu <jefflexu@xxxxxxxxxxxxxxxxx>
> > Acked-by: Miklos Szeredi <mszeredi@xxxxxxxxxx>
>

Hi Jingbo,

Thanks for sharing your analysis for this.

> Overall this patch LGTM.
>
> Apart from that, IMO the fi->writectr and fi->queued_writes mechanism is
> also unneeded then, at least the DIRECT IO routine (i.e.

I took a look at fi->writectr and fi->queued_writes and my
understanding is that we do still need this. For example, for
truncates (I'm looking at fuse_do_setattr()), I think we still need to
prevent concurrent writeback or else the setattr request and the
writeback request could race which would result in a mismatch between
the file's reported size and the actual data written to disk.

> fuse_direct_io()) doesn't need fuse_sync_writes() anymore.  That is
> because after removing the temp page, the DIRECT IO routine has already
> been waiting for all inflight WRITE requests, see
>
> # DIRECT read
> generic_file_read_iter
>   kiocb_write_and_wait
>     filemap_write_and_wait_range

Where do you see generic_file_read_iter() getting called for direct io reads?

For direct io reads, I'm only seeing

fuse_direct_IO()
  __fuse_direct_read()
    fuse_direct_io()

and

fuse_file_read_iter()
    fuse_direct_read_iter()
        fuse_direct_IO() / __fuse_direct_read()

>
> # DIRECT write
> generic_file_write_iter
>   generic_file_direct_write
>     kiocb_invalidate_pages
>       filemap_invalidate_pages
>         filemap_write_and_wait_range

Similarly, where do you see generic_file_write_iter() getting called
for direct io writes?
My understanding is that it'd either go through fuse_file_write_iter()
-> fuse_direct_write_iter() or through the fuse_direct_IO() callback.

>
> The DIRECT write routine will also invalidate the page cache in the
> range that is written to, so that the following buffer write needs to
> read the page cache back first. The writeback following the buffer write
> is much likely after the DIRECT write, so that the writeback won't
> conflict with the DIRECT write (i.e. there won't be duplicate WRITE
> requests for the same page that are initiated from DIRECT write and
> writeback at the same time), which is exactly why fi->writectr and
> fi->queued_writes are introduced.

Where do you see fi->writectr / fi->queued-writes preventing this
race? It looks to me like in the existing code, this race condition
you described of direct write invalidating the page cache, then
another buffer write reads the page cache and dirties it, then
writeback is called on that, and the 2 write requests racing, could
still happen?


> However it seems that the writeback
> won't wait for previous inflight DIRECT WRITE requests, so I'm not much
> sure about that.  Maybe other folks could offer more insights...

My understanding is that these lines

if (!cuse && filemap_range_has_writeback(...)) {
   ...
   fuse_sync_writes(inode);
   ...
}

in fuse_direct_io() is what waits on previous inflight direct write
requests to complete before the direct io happens.


>
> Also fuse_sync_writes() is not needed in fuse_flush() anymore, with
> which I'm pretty sure.

Why don't we still need this for fuse_flush()?

If a caller calls close(), this will call

filp_close()
  filp_flush()
      filp->f_op->flush()
          fuse_flush()

it seems like we should still be waiting for all writebacks to finish
before sending the fuse server the fuse_flush request, no?

>
> The potential cleanup for fi->writectr and fi->queued_writes could be
> offered as following separate patches (if any).
>

Thanks,
Joanne
>
> > ---
> >  fs/fuse/file.c   | 360 ++++-------------------------------------------
> >  fs/fuse/fuse_i.h |   3 -
> >  2 files changed, 28 insertions(+), 335 deletions(-)
> >
> > diff --git a/fs/fuse/file.c b/fs/fuse/file.c
> > index 754378dd9f71..91ada0208863 100644
> > --- a/fs/fuse/file.c
> > +++ b/fs/fuse/file.c
> > @@ -415,89 +415,11 @@ u64 fuse_lock_owner_id(struct fuse_conn *fc, fl_owner_t id)
> >
> >  struct fuse_writepage_args {
> >       struct fuse_io_args ia;
> > -     struct rb_node writepages_entry;
> >       struct list_head queue_entry;
> > -     struct fuse_writepage_args *next;
> >       struct inode *inode;
> >       struct fuse_sync_bucket *bucket;
> >  };
> >
> > -static struct fuse_writepage_args *fuse_find_writeback(struct fuse_inode *fi,
> > -                                         pgoff_t idx_from, pgoff_t idx_to)
> > -{
> > -     struct rb_node *n;
> > -
> > -     n = fi->writepages.rb_node;
> > -
> > -     while (n) {
> > -             struct fuse_writepage_args *wpa;
> > -             pgoff_t curr_index;
> > -
>
> --
> Thanks,
> Jingbo





[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux