On Wed, Jul 23, 2025 at 10:06 AM Darrick J. Wong <djwong@xxxxxxxxxx> wrote: > > On Mon, Jul 21, 2025 at 01:05:02PM -0700, Joanne Koong wrote: > > On Sat, Jul 19, 2025 at 12:18 AM Amir Goldstein <amir73il@xxxxxxxxx> wrote: > > > > > > On Sat, Jul 19, 2025 at 12:23 AM Joanne Koong <joannelkoong@xxxxxxxxx> wrote: > > > > > > > > On Thu, Jul 17, 2025 at 4:26 PM Darrick J. Wong <djwong@xxxxxxxxxx> wrote: > > > > > > > > > > From: Darrick J. Wong <djwong@xxxxxxxxxx> > > > > > > > > > > generic/488 fails with fuse2fs in the following fashion: > > > > > > > > > > Unfortunately, the 488.full file shows that there are a lot of hidden > > > > > files left over in the filesystem, with incorrect link counts. Tracing > > > > > fuse_request_* shows that there are a large number of FUSE_RELEASE > > > > > commands that are queued up on behalf of the unlinked files at the time > > > > > that fuse_conn_destroy calls fuse_abort_conn. Had the connection not > > > > > aborted, the fuse server would have responded to the RELEASE commands by > > > > > removing the hidden files; instead they stick around. > > > > > > > > Tbh it's still weird to me that FUSE_RELEASE is asynchronous instead > > > > of synchronous. For example for fuse servers that cache their data and > > > > only write the buffer out to some remote filesystem when the file gets > > > > closed, it seems useful for them to (like nfs) be able to return an > > > > error to the client for close() if there's a failure committing that > > > > data; that also has clearer API semantics imo, eg users are guaranteed > > > > that when close() returns, all the processing/cleanup for that file > > > > has been completed. Async FUSE_RELEASE also seems kind of racy, eg if > > > > the server holds local locks that get released in FUSE_RELEASE, if a > > > > subsequent FUSE_OPEN happens before FUSE_RELEASE then depends on > > > > grabbing that lock, then we end up deadlocked if the server is > > > > single-threaded. > > > > > > > > > > There is a very good reason for keeping FUSE_FLUSH and FUSE_RELEASE > > > (as well as those vfs ops) separate. > > > > Oh interesting, I didn't realize FUSE_FLUSH gets also sent on the > > release path. I had assumed FUSE_FLUSH was for the sync()/fsync() > > (That's FUSE_FSYNC) > > > case. But I see now that you're right, close() makes a call to > > filp_flush() in the vfs layer. (and I now see there's FUSE_FSYNC for > > the fsync() case) > > Yeah, flush-on-close (FUSE_FLUSH) is generally a good idea for > "unreliable" filesystems -- either because they're remote, or because > the local storage they're on could get yanked at any time. It's slow, > but it papers over a lot of bugs and "bad" usage. > > > > A filesystem can decide if it needs synchronous close() (not release). > > > And with FOPEN_NOFLUSH, the filesystem can decide that per open file, > > > (unless it conflicts with a config like writeback cache). > > > > > > I have a filesystem which can do very slow io and some clients > > > can get stuck doing open;fstat;close if close is always synchronous. > > > I actually found the libfuse feature of async flush (FUSE_RELEASE_FLUSH) > > > quite useful for my filesystem, so I carry a kernel patch to support it. > > > > > > The issue of racing that you mentioned sounds odd. > > > First of all, who runs a single threaded fuse server? > > > Second, what does it matter if release is sync or async, > > > FUSE_RELEASE will not be triggered by the same > > > task calling FUSE_OPEN, so if there is a deadlock, it will happen > > > with sync release as well. > > > > If the server is single-threaded, I think the FUSE_RELEASE would have > > to happen on the same task as FUSE_OPEN, so if the release is > > synchronous, this would avoid the deadlock because that guarantees the > > FUSE_RELEASE happens before the next FUSE_OPEN. > > On a single-threaded server(!) I would hope that the release would be > issued to the fuse server before the open. (I'm not sure I understand I don't think this is 100% guaranteed if fuse sends the release request asynchronously rather than synchronously (eg the request gets stalled on the bg queue if active_background >= max_background) > where this part of the thread went, because why would that happen? And > why would the fuse server hold a lock across requests?) The fuse server holding a lock across requests example was a contrived one to illustrate that an async release could be racy if a fuse server implementation has the (standard?) expectation that release and opens are always received in order. > > > However now that you pointed out FUSE_FLUSH gets sent on the release > > path, that addresses my worry about async FUSE_RELEASE returning > > before the server has gotten a chance to write out their local buffer > > cache. > > <nod> > > --D > > > Thanks, > > Joanne > > > > > > Thanks, > > > Amir. > >