Re: [PATCH v8 0/2] fuse: remove temp page copies in writeback

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Apr 14, 2025 at 3:47 PM Shakeel Butt <shakeel.butt@xxxxxxxxx> wrote:
>
> On Mon, Apr 14, 2025 at 03:22:08PM -0700, Joanne Koong wrote:
> > The purpose of this patchset is to help make writeback in FUSE filesystems as
> > fast as possible.
> >
> > In the current FUSE writeback design (see commit 3be5a52b30aa
> > ("fuse: support writable mmap"))), a temp page is allocated for every dirty
> > page to be written back, the contents of the dirty page are copied over to the
> > temp page, and the temp page gets handed to the server to write back. This is
> > done so that writeback may be immediately cleared on the dirty page, and this
> > in turn is done in order to mitigate the following deadlock scenario that may
> > arise if reclaim waits on writeback on the dirty page to complete (more
> > details
> > can be found in this thread [1]):
> > * single-threaded FUSE server is in the middle of handling a request
> >   that needs a memory allocation
> > * memory allocation triggers direct reclaim
> > * direct reclaim waits on a folio under writeback
> > * the FUSE server can't write back the folio since it's stuck in
> >   direct reclaim
> >
> > Allocating and copying dirty pages to temp pages is the biggest performance
> > bottleneck for FUSE writeback. This patchset aims to get rid of the temp page
> > altogether (which will also allow us to get rid of the internal FUSE rb tree
> > that is needed to keep track of writeback status on the temp pages).
> > Benchmarks show approximately a 20% improvement in throughput for 4k
> > block-size writes and a 45% improvement for 1M block-size writes.
> >
> > In the current reclaim code, there is one scenario where writeback is waited
> > on, which is the case where the system is running legacy cgroupv1 and reclaim
> > encounters a folio that already has the reclaim flag set and the caller did
> > not have __GFP_FS (or __GFP_IO if swap) set.
> >
> > This patchset adds a new mapping flag, AS_WRITEBACK_MAY_DEADLOCK_ON_RECLAIM,
> > which filesystems may set on its inode mappings to indicate that reclaim
> > should not wait on writeback. FUSE will set this flag on its mappings. Reclaim
> > for the legacy cgroup v1 case described above will skip reclaim of folios with
> > that flag set. With this flag set, now FUSE can remove temp pages altogether.
> >
> > With this change, writeback state is now only cleared on the dirty page after
> > the server has written it back to disk. If the server is deliberately
> > malicious or well-intentioned but buggy, this may stall sync(2) and page
> > migration, but for sync(2), a malicious server may already stall this by not
> > replying to the FUSE_SYNCFS request and for page migration, there are already
> > many easier ways to stall this by having FUSE permanently hold the folio lock.
> > A fuller discussion on this can be found in [2]. Long-term, there needs to be
> > a more comprehensive solution for addressing migration of FUSE pages that
> > handles all scenarios where FUSE may permanently hold the lock, but that is
> > outside the scope of this patchset and will be done as future work. Please
> > also note that this change also now ensures that when sync(2) returns, FUSE
> > filesystems will have persisted writeback changes.
> >
> > For this patchset, it would be ideal if the first patch could be taken by
> > Andrew to the mm tree and the second patch could be taken by Miklos into the
> > fuse tree, as the fuse large folios patchset [3] depends on the second patch.
>
> Why not take both patches through FUSE tree? Second patch has dependency
> on first patch, so there is no need to keep them separate.

If that's possible, that sounds great to me too. The patchset went
through Andrew's mm tree last time, so I'm not sure if the protocol is
that any/all mm changes need to go through Andrew's tree.

Thanks,
Joanne





[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux