On Wed, Jul 23, 2025 at 2:02 PM Joanne Koong <joannelkoong@xxxxxxxxx> wrote: > > On Wed, Jul 23, 2025 at 10:34 AM Darrick J. Wong <djwong@xxxxxxxxxx> wrote: > > > > On Mon, Jul 21, 2025 at 01:32:43PM -0700, Joanne Koong wrote: > > > On Fri, Jul 18, 2025 at 5:32 PM Darrick J. Wong <djwong@xxxxxxxxxx> wrote: > > > > > > > > On Fri, Jul 18, 2025 at 03:23:30PM -0700, Joanne Koong wrote: > > > > > On Thu, Jul 17, 2025 at 4:26 PM Darrick J. Wong <djwong@xxxxxxxxxx> wrote: > > > > > > > > > > > > From: Darrick J. Wong <djwong@xxxxxxxxxx> > > > > > > > > > > > > generic/488 fails with fuse2fs in the following fashion: > > > > > > > > > > > > generic/488 _check_generic_filesystem: filesystem on /dev/sdf is inconsistent > > > > > > (see /var/tmp/fstests/generic/488.full for details) > > > > > > > > > > > > This test opens a large number of files, unlinks them (which really just > > > > > > renames them to fuse hidden files), closes the program, unmounts the > > > > > > filesystem, and runs fsck to check that there aren't any inconsistencies > > > > > > in the filesystem. > > > > > > > > > > > > Unfortunately, the 488.full file shows that there are a lot of hidden > > > > > > files left over in the filesystem, with incorrect link counts. Tracing > > > > > > fuse_request_* shows that there are a large number of FUSE_RELEASE > > > > > > commands that are queued up on behalf of the unlinked files at the time > > > > > > that fuse_conn_destroy calls fuse_abort_conn. Had the connection not > > > > > > aborted, the fuse server would have responded to the RELEASE commands by > > > > > > removing the hidden files; instead they stick around. > > > > > > > > > > Tbh it's still weird to me that FUSE_RELEASE is asynchronous instead > > > > > of synchronous. For example for fuse servers that cache their data and > > > > > only write the buffer out to some remote filesystem when the file gets > > > > > closed, it seems useful for them to (like nfs) be able to return an > > > > > error to the client for close() if there's a failure committing that > > > > > > > > I don't think supplying a return value for close() is as helpful as it > > > > seems -- the manage says that there is no guarantee that data has been > > > > flushed to disk; and if the file is removed from the process' fd table > > > > then the operation succeeded no matter the return value. :P > > > > > > > > (Also C programmers tend to be sloppy and not check the return value.) > > > > > > Amir pointed out FUSE_FLUSH gets sent on the FUSE_RELEASE path so that > > > addresses my worry. FUSE_FLUSH is sent synchronously (and close() will > > > propagate any flush errors too), so now if there's an abort or > > > something right after close() returns, the client is guaranteed that > > > any data they wrote into a local cache has been flushed by the server. > > > > <nod> > > > > > > > > > > > data; that also has clearer API semantics imo, eg users are guaranteed > > > > > that when close() returns, all the processing/cleanup for that file > > > > > has been completed. Async FUSE_RELEASE also seems kind of racy, eg if > > > > > the server holds local locks that get released in FUSE_RELEASE, if a > > > > > > > > Yes. I think it's only useful for the case outined in that patch, which > > > > is that a program started an asyncio operation and then closed the fd. > > > > In that particular case the program unambiguously doesn't care about the > > > > return value of close so it's ok to perform the release asynchronously. > > > > > > I wonder why fuseblk devices need to be synchronously released. The > > > comment says " Make the release synchronous if this is a fuseblk > > > mount, synchronous RELEASE is allowed (and desirable)". Why is it > > > desirable? > > > > Err, which are you asking about? > > > > Are you asking why it is that fuseblk mounts call FUSE_DESTROY from > > unmount instead of letting libfuse synthesize it once the event loop > > terminates? I think that's because in the fuseblk case, the kernel has > > the block device open for itself, so the fuse server must write and > > flush all dirty data before the unmount() returns to the caller. > > > > Or were you asking why synchronous RELEASE is done on fuseblk > > filesystems? Here is my speculation: > > > > Synchronous RELEASE was added back in commit 5a18ec176c934c ("fuse: fix > > hang of single threaded fuseblk filesystem"). I /think/ the idea behind > > that patch was that for fuseblk servers, we're ok with issuing a > > FUSE_DESTROY request from the kernel and waiting on it. > > > > However, for that to work correctly, all previous pending requests > > anywhere in the fuse mount have to be flushed to and completed by the > > fuse server before we can send DESTROY, because destroy closes the > > filesystem. > > > > So I think the idea behind 5a18ec176c934c is that we make FUSE_RELEASE > > synchronous so it's not possible to umount(8) until all the releases > > requests are finished. > > Thanks for the explanation. With the fix you added in this patch then, > it seems there's no reason fuseblk requests shouldn't now also be > asynchronous since your fix ensures that all pending requests have > been flushed and completed before issuing the DESTROY > > > > > > > > subsequent FUSE_OPEN happens before FUSE_RELEASE then depends on > > > > > grabbing that lock, then we end up deadlocked if the server is > > > > > single-threaded. > > > > > > > > Hrm. I suppose if you had a script that ran two programs one after the > > > > other, each of which expected to be able to open and lock the same file, > > > > then you could run into problems if the lock isn't released by the time > > > > the second program is ready to open the file. > > > > > > I think in your scenario with the two programs, the worst outcome is > > > that the open/lock acquiring can take a while but in the (contrived > > > and probably far-fetched) scenario where it's single threaded, it > > > would result in a complete deadlock. > > > > <nod> I concede it's a minor point. :) > > > > > > But having said that, some other program could very well open and lock > > > > the file as soon as the lock drops. > > > > > > > > > I saw in your first patch that sending FUSE_RELEASE synchronously > > > > > leads to a deadlock under AIO but AFAICT, that happens because we > > > > > execute req->args->end() in fuse_request_end() synchronously; I think > > > > > if we execute that release asynchronously on a worker thread then that > > > > > gets rid of the deadlock. > > > > > > > > <nod> Last time I think someone replied that maybe they should all be > > > > asynchronous. > > > > > > > > > If FUSE_RELEASE must be asynchronous though, then your approach makes > > > > > sense to me. > > > > > > > > I think it only has to be asynchronous for the weird case outlined in > > > > that patch (fuse server gets stuck closing its own client's fds). > > > > Personally I think release ought to be synchronous at least as far as > > > > the kernel doing all the stuff that close() says it has to do (removal > > > > of record locks, deleting the fd table entry). > > > > > > > > Note that doesn't necessarily mean that the kernel has to be completely > > > > done with all the work that entails. XFS defers freeing of unlinked > > > > files until a background garbage collector gets around to doing that. > > > > Other filesystems will actually make you wait while they free all the > > > > data blocks and the inode. But the kernel has no idea what the fuse > > > > server actually does. > > > > > > I guess if that's important enough to the server, we could add > > > something an FOPEN flag for that that servers could set on the file > > > handle if they want synchronous release? > > > > If a fuse server /did/ have background garbage collection, there are a > > few things it could do -- every time it sees a FUSE_RELEASE of an > > unlinked file, it could set a timer (say 50ms) after which it would kick > > the gc thread to do its thing. Or it could do wake up the background > > thread in response to a FUSE_SYNCFS command and hope it finishes by the > > time FUSE_DESTROY comes around. > > > > (Speaking of which, can we enable syncfs for all fuse servers?) > > I'm not sure what you mean by this - i thought the implementation of > FUSE_SYNCFS is dependent on each server's logic depending on if > they've set a callback for it or not? Speaking of which, it doesn't > look like FUSE_SYNCFS support has been added to libfuse yet. > > > > > But that said, not everyone wants the fancy background gc stuff that XFS > > does. FUSE_RELEASE would then be doing a lot of work. > > > > > after Amir's point about FUSE_FLUSH, I'm in favor now of FUSE_RELEASE > > > being asynchronous. > > > > > > > > > > Create a function to push all the background requests to the queue and > > > > > > then wait for the number of pending events to hit zero, and call this > > > > > > before fuse_abort_conn. That way, all the pending events are processed > > > > > > by the fuse server and we don't end up with a corrupt filesystem. > > > > > > > > > > > > Signed-off-by: "Darrick J. Wong" <djwong@xxxxxxxxxx> > > > > > > --- > > > > > > fs/fuse/fuse_i.h | 6 ++++++ > > > > > > fs/fuse/dev.c | 38 ++++++++++++++++++++++++++++++++++++++ > > > > > > fs/fuse/inode.c | 1 + > > > > > > 3 files changed, 45 insertions(+) > > > > > > > > > > > > diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h > > > > > > +/* > > > > > > + * Flush all pending requests and wait for them. Only call this function when > > > > > > + * it is no longer possible for other threads to add requests. > > > > > > + */ > > > > > > +void fuse_flush_requests(struct fuse_conn *fc, unsigned long timeout) > > > > > > > > > > It might be worth renaming this to something like > > > > > 'fuse_flush_bg_requests' to make it more clear that this only flushes > > > > > background requests > > > > > > > > Hum. Did I not understand the code correctly? I thought that > > > > flush_bg_queue puts all the background requests onto the active queue > > > > and issues them to the fuse server; and the wait_event_timeout sits > > > > around waiting for all the requests to receive their replies? > > > > > > Sorry, didn't mean to be confusing with my previous comment. What I > > > was trying to say is that "fuse_flush_requests" implies that all > > > requests get flushed to userspace but here only the background > > > requests get flushed. > > > > Oh, I see now, I /was/ mistaken. Synchronous requests are ... > > > > Wait, no, still confused :( > > > > fuse_flush_requests waits until fuse_conn::num_waiting is zero. > > > > Synchronous requests (aka the ones sent through fuse_simple_request) > > bump num_waiting either directly in the args->force case or indirectly > > via fuse_get_req. num_waiting is decremented in fuse_put_request. > > Therefore waiting for num_waiting to hit zero implements waiting for all > > the requests that were in flight before fuse_flush_requests was called. > > > > Background requests (aka the ones sent via fuse_simple_background) have > > num_waiting set in the !args->force case or indirectly in > > fuse_request_queue_background. num_waiting is decremented in > > fuse_put_request the same as is done for synchronous requests. > > > > Therefore, it's correct to say that waiting for num_requests to become 0 > > is sufficient to wait for all pending requests anywhere in the > > fuse_mount to complete. > > You're right, good point, waiting on fc->num_waiting == 0 also ensures > foreground requests have been completed. sorry for the confusion! > > Connections can also be aborted through the > /sys/fs/fuse/connections/*/abort interface or through request timeouts > (eg fuse_check_timeout()) - should those places too flush pending > requests and wait for them before aborting the connection? > Or I guess just the FUSE_RELEASE one since that seems to be the only one that could lead to disk inconsistencies if it's not completed