On 8/1/25 12:15, Luis Henriques wrote: > On Thu, Jul 31 2025, Darrick J. Wong wrote: > >> On Thu, Jul 31, 2025 at 09:04:58AM -0400, Theodore Ts'o wrote: >>> On Tue, Jul 29, 2025 at 04:38:54PM -0700, Darrick J. Wong wrote: >>>> >>>> Just speaking for fuse2fs here -- that would be kinda nifty if libfuse >>>> could restart itself. It's unclear if doing so will actually enable us >>>> to clear the condition that caused the failure in the first place, but I >>>> suppose fuse2fs /does/ have e2fsck -fy at hand. So maybe restarts >>>> aren't totally crazy. >>> >>> I'm trying to understand what the failure scenario is here. Is this >>> if the userspace fuse server (i.e., fuse2fs) has crashed? If so, what >>> is supposed to happen with respect to open files, metadata and data >>> modifications which were in transit, etc.? Sure, fuse2fs could run >>> e2fsck -fy, but if there are dirty inode on the system, that's going >>> potentally to be out of sync, right? >>> >>> What are the recovery semantics that we hope to be able to provide? >> >> <echoing what we said on the ext4 call this morning> >> >> With iomap, most of the dirty state is in the kernel, so I think the new >> fuse2fs instance would poke the kernel with FUSE_NOTIFY_RESTARTED, which >> would initiate GETATTR requests on all the cached inodes to validate >> that they still exist; and then resend all the unacknowledged requests >> that were pending at the time. It might be the case that you have to >> that in the reverse order; I only know enough about the design of fuse >> to suspect that to be true. >> >> Anyhow once those are complete, I think we can resume operations with >> the surviving inodes. The ones that fail the GETATTR revalidation are >> fuse_make_bad'd, which effectively revokes them. > > Ah! Interesting, I have been playing a bit with sending LOOKUP requests, > but probably GETATTR is a better option. > > So, are you currently working on any of this? Are you implementing this > new NOTIFY_RESTARTED request? I guess it's time for me to have a closer > look at fuse2fs too. Sorry for joining the discussion late, I was totally occupied, day and night. Added Kevin to CC, who is going to work on recovery on our DDN side. Issue with GETATTR and LOOKUP is that they need a path, but on fuse server restart we want kernel to recover inodes and their lookup count. Now inode recovery might be hard, because we currently only have a 64-bit node-id - which is used my most fuse application as memory pointer. As Luis wrote, my issue with FUSE_NOTIFY_RESEND is that it just re-sends outstanding requests. And that ends up in most cases in sending requests with invalid node-IDs, that are casted and might provoke random memory access on restart. Kind of the same issue why fuse nfs export or open_by_handle_at doesn't work well right now. So IMHO, what we really want is something like FUSE_LOOKUP_FH, which would not return a 64-bit node ID, but a max 128 byte file handle. And then FUSE_REVALIDATE_FH on server restart. The file handles could be stored into the fuse inode and also used for NFS export. I *think* Amir had a similar idea, but I don't find the link quickly. Adding Amir to CC. Our short term plan is to add something like FUSE_NOTIFY_RESTART, which will iterate over all superblock inodes and mark them with fuse_make_bad. Any objections against that? Thanks, Bernd