Re: [PATCH 2/7] fuse: flush pending fuse events before aborting the connection

Joanne Koong <joannelkoong@xxxxxxxxx> · Wed, 23 Jul 2025 13:27:44 -0700

On Wed, Jul 23, 2025 at 10:06 AM Darrick J. Wong <djwong@xxxxxxxxxx> wrote:
>
> On Mon, Jul 21, 2025 at 01:05:02PM -0700, Joanne Koong wrote:
> > On Sat, Jul 19, 2025 at 12:18 AM Amir Goldstein <amir73il@xxxxxxxxx> wrote:
> > >
> > > On Sat, Jul 19, 2025 at 12:23 AM Joanne Koong <joannelkoong@xxxxxxxxx> wrote:
> > > >
> > > > On Thu, Jul 17, 2025 at 4:26 PM Darrick J. Wong <djwong@xxxxxxxxxx> wrote:
> > > > >
> > > > > From: Darrick J. Wong <djwong@xxxxxxxxxx>
> > > > >
> > > > > generic/488 fails with fuse2fs in the following fashion:
> > > > >
> > > > > Unfortunately, the 488.full file shows that there are a lot of hidden
> > > > > files left over in the filesystem, with incorrect link counts.  Tracing
> > > > > fuse_request_* shows that there are a large number of FUSE_RELEASE
> > > > > commands that are queued up on behalf of the unlinked files at the time
> > > > > that fuse_conn_destroy calls fuse_abort_conn.  Had the connection not
> > > > > aborted, the fuse server would have responded to the RELEASE commands by
> > > > > removing the hidden files; instead they stick around.
> > > >
> > > > Tbh it's still weird to me that FUSE_RELEASE is asynchronous instead
> > > > of synchronous. For example for fuse servers that cache their data and
> > > > only write the buffer out to some remote filesystem when the file gets
> > > > closed, it seems useful for them to (like nfs) be able to return an
> > > > error to the client for close() if there's a failure committing that
> > > > data; that also has clearer API semantics imo, eg users are guaranteed
> > > > that when close() returns, all the processing/cleanup for that file
> > > > has been completed.  Async FUSE_RELEASE also seems kind of racy, eg if
> > > > the server holds local locks that get released in FUSE_RELEASE, if a
> > > > subsequent FUSE_OPEN happens before FUSE_RELEASE then depends on
> > > > grabbing that lock, then we end up deadlocked if the server is
> > > > single-threaded.
> > > >
> > >
> > > There is a very good reason for keeping FUSE_FLUSH and FUSE_RELEASE
> > > (as well as those vfs ops) separate.
> >
> > Oh interesting, I didn't realize FUSE_FLUSH gets also sent on the
> > release path. I had assumed FUSE_FLUSH was for the sync()/fsync()
>
> (That's FUSE_FSYNC)
>
> > case. But I see now that you're right, close() makes a call to
> > filp_flush() in the vfs layer. (and I now see there's FUSE_FSYNC for
> > the fsync() case)
>
> Yeah, flush-on-close (FUSE_FLUSH) is generally a good idea for
> "unreliable" filesystems -- either because they're remote, or because
> the local storage they're on could get yanked at any time.  It's slow,
> but it papers over a lot of bugs and "bad" usage.
>
> > > A filesystem can decide if it needs synchronous close() (not release).
> > > And with FOPEN_NOFLUSH, the filesystem can decide that per open file,
> > > (unless it conflicts with a config like writeback cache).
> > >
> > > I have a filesystem which can do very slow io and some clients
> > > can get stuck doing open;fstat;close if close is always synchronous.
> > > I actually found the libfuse feature of async flush (FUSE_RELEASE_FLUSH)
> > > quite useful for my filesystem, so I carry a kernel patch to support it.
> > >
> > > The issue of racing that you mentioned sounds odd.
> > > First of all, who runs a single threaded fuse server?
> > > Second, what does it matter if release is sync or async,
> > > FUSE_RELEASE will not be triggered by the same
> > > task calling FUSE_OPEN, so if there is a deadlock, it will happen
> > > with sync release as well.
> >
> > If the server is single-threaded, I think the FUSE_RELEASE would have
> > to happen on the same task as FUSE_OPEN, so if the release is
> > synchronous, this would avoid the deadlock because that guarantees the
> > FUSE_RELEASE happens before the next FUSE_OPEN.
>
> On a single-threaded server(!) I would hope that the release would be
> issued to the fuse server before the open.  (I'm not sure I understand

I don't think this is 100% guaranteed if fuse sends the release
request asynchronously rather than synchronously (eg the request gets
stalled on the bg queue if active_background >= max_background)

> where this part of the thread went, because why would that happen?  And
> why would the fuse server hold a lock across requests?)

The fuse server holding a lock across requests example was a contrived
one to illustrate that an async release could be racy if a fuse server
implementation has the (standard?) expectation that release and opens
are always received in order.

>
> > However now that you pointed out FUSE_FLUSH gets sent on the release
> > path, that addresses my worry about async FUSE_RELEASE returning
> > before the server has gotten a chance to write out their local buffer
> > cache.
>
> <nod>
>
> --D
>
> > Thanks,
> > Joanne
> > >
> > > Thanks,
> > > Amir.
> >