On Wed, 2025-07-23 at 08:37 -0700, Darrick J. Wong wrote: > On Tue, Jul 22, 2025 at 08:38:08AM -0400, Jeff Layton wrote: > > On Tue, 2025-07-22 at 08:30 -0400, Jeff Layton wrote: > > > On Fri, 2025-07-18 at 17:32 -0700, Darrick J. Wong wrote: > > > > On Fri, Jul 18, 2025 at 03:23:30PM -0700, Joanne Koong wrote: > > > > > On Thu, Jul 17, 2025 at 4:26 PM Darrick J. Wong <djwong@xxxxxxxxxx> wrote: > > > > > > > > > > > > From: Darrick J. Wong <djwong@xxxxxxxxxx> > > > > > > > > > > > > generic/488 fails with fuse2fs in the following fashion: > > > > > > > > > > > > generic/488 _check_generic_filesystem: filesystem on /dev/sdf is inconsistent > > > > > > (see /var/tmp/fstests/generic/488.full for details) > > > > > > > > > > > > This test opens a large number of files, unlinks them (which really just > > > > > > renames them to fuse hidden files), closes the program, unmounts the > > > > > > filesystem, and runs fsck to check that there aren't any inconsistencies > > > > > > in the filesystem. > > > > > > > > > > > > Unfortunately, the 488.full file shows that there are a lot of hidden > > > > > > files left over in the filesystem, with incorrect link counts. Tracing > > > > > > fuse_request_* shows that there are a large number of FUSE_RELEASE > > > > > > commands that are queued up on behalf of the unlinked files at the time > > > > > > that fuse_conn_destroy calls fuse_abort_conn. Had the connection not > > > > > > aborted, the fuse server would have responded to the RELEASE commands by > > > > > > removing the hidden files; instead they stick around. > > > > > > > > > > Tbh it's still weird to me that FUSE_RELEASE is asynchronous instead > > > > > of synchronous. For example for fuse servers that cache their data and > > > > > only write the buffer out to some remote filesystem when the file gets > > > > > closed, it seems useful for them to (like nfs) be able to return an > > > > > error to the client for close() if there's a failure committing that > > > > > > > > I don't think supplying a return value for close() is as helpful as it > > > > seems -- the manage says that there is no guarantee that data has been > > > > flushed to disk; and if the file is removed from the process' fd table > > > > then the operation succeeded no matter the return value. :P > > > > > > > > (Also C programmers tend to be sloppy and not check the return value.) > > > > > > > > > > The POSIX spec and manpage for close(2) make no mention of writeback > > > errors, so it's not 100% clear that returning them there is at all OK. > > > Everyone sort of assumes that it makes sense to do so, but it can be > > > actively harmful. > > > > > > > Actually, they do mention this, but I still argue that it's not a good > > idea to do so. If you want writeback errors use fsync() (or maybe the > > new ioctl() that someone was plumbing in that scrapes errors without > > doing writeback). > > > > > Suppose we do this: > > > > > > open() = 1 > > > write(1) > > > close(1) > > > open() = 2 > > > fsync(2) = ??? > > > > > > Now, assume there was a writeback error that happens either before or > > > after the close. > > > > > > With the way this works today, you will get back an error on that final > > > fsync() even if fd 2 was opened _after_ the writeback error occurred, > > > because nothing will have scraped it yet. > > > > > > If you scrape the error to return it on the close though, then the > > > result of that fsync() would be inconclusive. If the error happens > > > before the close(), then fsync() will return 0. If it fails after the > > > close(), then the fsync() will see an error. > > <nod> Given the horrible legacy of C programmers not really checking the > return value from close(), I think that /if/ the kernel is going to > check for writeback errors at close, it should sample the error state > but not clear it, so that the fsync returns accumulated errors. > > (That said, my opinion is that after years of all of us telling > programmers that fsync is the golden standard for checking if bad stuff > happened, we really ought only be clearing error state during fsync.) > That is pretty doable. The only question is whether it's something we *want* to do. Something like this would probably be enough if so: diff --git a/fs/open.c b/fs/open.c index 7828234a7caa..a20657a85ee1 100644 --- a/fs/open.c +++ b/fs/open.c @@ -1582,6 +1582,10 @@ SYSCALL_DEFINE1(close, unsigned int, fd) retval = filp_flush(file, current->files); + /* Do an opportunistic writeback error check before returning. */ + if (likely(retval == 0)) + retval = filemap_check_wb_err(file_inode(file)->i_mapping, file->f_wb_err); + /* * We're returning to user space. Don't bother * with any delayed fput() cases. > Evidently some projects do fsync-after-open assuming that close doesn't > flush and wait for writeback: > https://despairlabs.com/blog/posts/2025-03-13-fsync-after-open-is-an-elaborate-no-op/ > > > > > > data; that also has clearer API semantics imo, eg users are guaranteed > > > > > that when close() returns, all the processing/cleanup for that file > > > > > has been completed. Async FUSE_RELEASE also seems kind of racy, eg if > > > > > the server holds local locks that get released in FUSE_RELEASE, if a > > > > > > > > Yes. I think it's only useful for the case outined in that patch, which > > > > is that a program started an asyncio operation and then closed the fd. > > > > In that particular case the program unambiguously doesn't care about the > > > > return value of close so it's ok to perform the release asynchronously. > > > > > > > > > subsequent FUSE_OPEN happens before FUSE_RELEASE then depends on > > > > > grabbing that lock, then we end up deadlocked if the server is > > > > > single-threaded. > > > > > > > > Hrm. I suppose if you had a script that ran two programs one after the > > > > other, each of which expected to be able to open and lock the same file, > > > > then you could run into problems if the lock isn't released by the time > > > > the second program is ready to open the file. > > > > > > > > But having said that, some other program could very well open and lock > > > > the file as soon as the lock drops. > > > > > > > > > I saw in your first patch that sending FUSE_RELEASE synchronously > > > > > leads to a deadlock under AIO but AFAICT, that happens because we > > > > > execute req->args->end() in fuse_request_end() synchronously; I think > > > > > if we execute that release asynchronously on a worker thread then that > > > > > gets rid of the deadlock. > > > > > > > > <nod> Last time I think someone replied that maybe they should all be > > > > asynchronous. > > > > > > > > > If FUSE_RELEASE must be asynchronous though, then your approach makes > > > > > sense to me. > > > > > > > > I think it only has to be asynchronous for the weird case outlined in > > > > that patch (fuse server gets stuck closing its own client's fds). > > > > Personally I think release ought to be synchronous at least as far as > > > > the kernel doing all the stuff that close() says it has to do (removal > > > > of record locks, deleting the fd table entry). > > > > > > > > Note that doesn't necessarily mean that the kernel has to be completely > > > > done with all the work that entails. XFS defers freeing of unlinked > > > > files until a background garbage collector gets around to doing that. > > > > Other filesystems will actually make you wait while they free all the > > > > data blocks and the inode. But the kernel has no idea what the fuse > > > > server actually does. > > > > > > > > > > Create a function to push all the background requests to the queue and > > > > > > then wait for the number of pending events to hit zero, and call this > > > > > > before fuse_abort_conn. That way, all the pending events are processed > > > > > > by the fuse server and we don't end up with a corrupt filesystem. > > > > > > > > > > > > Signed-off-by: "Darrick J. Wong" <djwong@xxxxxxxxxx> > > > > > > --- > > > > > > fs/fuse/fuse_i.h | 6 ++++++ > > > > > > fs/fuse/dev.c | 38 ++++++++++++++++++++++++++++++++++++++ > > > > > > fs/fuse/inode.c | 1 + > > > > > > 3 files changed, 45 insertions(+) > > > > > > > > > > > > diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h > > > > > > +/* > > > > > > + * Flush all pending requests and wait for them. Only call this function when > > > > > > + * it is no longer possible for other threads to add requests. > > > > > > + */ > > > > > > +void fuse_flush_requests(struct fuse_conn *fc, unsigned long timeout) > > > > > > > > > > It might be worth renaming this to something like > > > > > 'fuse_flush_bg_requests' to make it more clear that this only flushes > > > > > background requests > > > > > > > > Hum. Did I not understand the code correctly? I thought that > > > > flush_bg_queue puts all the background requests onto the active queue > > > > and issues them to the fuse server; and the wait_event_timeout sits > > > > around waiting for all the requests to receive their replies? > > > > > > > > I could be mistaken though. This is my rough understanding of what > > > > happens to background requests: > > > > > > > > 1. Request created > > > > 2. Put request on bg_queue > > > > 3. <wait> > > > > 4. Request removed from bg_queue > > > > 5. Request sent > > > > 6. <wait> > > > > 7. Reply received > > > > 8. Request ends and is _put. > > > > > > > > Non-background (foreground?) requests skip steps 2-4. Meanwhile, > > > > fc->waiting tracks the number of requests that are anywhere between the > > > > end of step 1 and the start of step 8. > > > > > > > > In any case, I want to push all the bg requests and wait until there are > > > > no more requests in the system. > > > > > > > > --D > > > > -- > > Jeff Layton <jlayton@xxxxxxxxxx> > > -- Jeff Layton <jlayton@xxxxxxxxxx>