On Fri, Sep 12, 2025 at 12:31 PM Bernd Schubert <bernd@xxxxxxxxxxx> wrote: > > > > On 8/1/25 12:15, Luis Henriques wrote: > > On Thu, Jul 31 2025, Darrick J. Wong wrote: > > > >> On Thu, Jul 31, 2025 at 09:04:58AM -0400, Theodore Ts'o wrote: > >>> On Tue, Jul 29, 2025 at 04:38:54PM -0700, Darrick J. Wong wrote: > >>>> > >>>> Just speaking for fuse2fs here -- that would be kinda nifty if libfuse > >>>> could restart itself. It's unclear if doing so will actually enable us > >>>> to clear the condition that caused the failure in the first place, but I > >>>> suppose fuse2fs /does/ have e2fsck -fy at hand. So maybe restarts > >>>> aren't totally crazy. > >>> > >>> I'm trying to understand what the failure scenario is here. Is this > >>> if the userspace fuse server (i.e., fuse2fs) has crashed? If so, what > >>> is supposed to happen with respect to open files, metadata and data > >>> modifications which were in transit, etc.? Sure, fuse2fs could run > >>> e2fsck -fy, but if there are dirty inode on the system, that's going > >>> potentally to be out of sync, right? > >>> > >>> What are the recovery semantics that we hope to be able to provide? > >> > >> <echoing what we said on the ext4 call this morning> > >> > >> With iomap, most of the dirty state is in the kernel, so I think the new > >> fuse2fs instance would poke the kernel with FUSE_NOTIFY_RESTARTED, which > >> would initiate GETATTR requests on all the cached inodes to validate > >> that they still exist; and then resend all the unacknowledged requests > >> that were pending at the time. It might be the case that you have to > >> that in the reverse order; I only know enough about the design of fuse > >> to suspect that to be true. > >> > >> Anyhow once those are complete, I think we can resume operations with > >> the surviving inodes. The ones that fail the GETATTR revalidation are > >> fuse_make_bad'd, which effectively revokes them. > > > > Ah! Interesting, I have been playing a bit with sending LOOKUP requests, > > but probably GETATTR is a better option. > > > > So, are you currently working on any of this? Are you implementing this > > new NOTIFY_RESTARTED request? I guess it's time for me to have a closer > > look at fuse2fs too. > > Sorry for joining the discussion late, I was totally occupied, day and > night. Added Kevin to CC, who is going to work on recovery on our > DDN side. > > Issue with GETATTR and LOOKUP is that they need a path, but on fuse > server restart we want kernel to recover inodes and their lookup count. > Now inode recovery might be hard, because we currently only have a > 64-bit node-id - which is used my most fuse application as memory > pointer. > > As Luis wrote, my issue with FUSE_NOTIFY_RESEND is that it just re-sends > outstanding requests. And that ends up in most cases in sending requests > with invalid node-IDs, that are casted and might provoke random memory > access on restart. Kind of the same issue why fuse nfs export or > open_by_handle_at doesn't work well right now. > > So IMHO, what we really want is something like FUSE_LOOKUP_FH, which > would not return a 64-bit node ID, but a max 128 byte file handle. > And then FUSE_REVALIDATE_FH on server restart. > The file handles could be stored into the fuse inode and also used for > NFS export. > > I *think* Amir had a similar idea, but I don't find the link quickly. > Adding Amir to CC. Or maybe it was Miklos' idea. Hard to keep track of this rolling thread: https://lore.kernel.org/linux-fsdevel/CAJfpegvNZ6Z7uhuTdQ6quBaTOYNkAP8W_4yUY4L2JRAEKxEwOQ@xxxxxxxxxxxxxx/ > > Our short term plan is to add something like FUSE_NOTIFY_RESTART, which > will iterate over all superblock inodes and mark them with fuse_make_bad. > Any objections against that? IDK, it seems much more ugly than implementing LOOKUP_HANDLE and I am not sure that LOOKUP_HANDLE is that hard to implement, when comparing to this alternative. I mean a restartable server is going to be a new implementation anyway, right? So it makes sense to start with a cleaner and more adequate protocol, does it not? Thanks, Amir.