Re: Reseting pending fanotify events

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Apr 01, 2025 at 06:28:11PM +0200, Jan Kara wrote:
> On Mon 31-03-25 21:08:51, Amir Goldstein wrote:
> > [CC Jan and Josef]
> 
> CCed fsdevel. Actually replying here because the quoting in Ibrahim's email
> got somehow broken which made it very hard to understand.
> 
> > I am keeping this discussion private because you did not post it to
> > the public list,
> > but if you can CC fsdevel in your reply that would be great, because it seems
> > like a question with interest to a wider audience.
> > 
> > On Mon, Mar 31, 2025 at 8:19 PM Ibrahim Jirdeh <ibrahimjirdeh@xxxxxxxx> wrote:
> > >
> > > Hi Amir,
> > >
> > > We have been using fanotify to support lazily loading file contents.
> > > We are struggling with the problem that pending permission events cannot be recovered on daemon restart.
> > >
> > > We have a long-lived daemon that marks files with FAN_OPEN_PERM and populates their contents on access.
> > > It needs a reliable path for updates & crash recovery.
> > > The happy path for fanotify event processing is as follows:
> > >
> > > A notification is read from fanotify file descriptor
> > > File contents are populated
> > > We write back FAN_ALLOW to fanotify file descriptor, or DENY if content population failed.
> > >
> > > We would like to guarantee that all file accesses receive an ALLOW or DENY response, and no events are lost.
> > 
> > Makes sense.
> > 
> > > Unfortunately, today a filesystem client can hang (in D state)
> > > if the event-handler daemon crashes or restarts at the wrong time.
> > 
> > Can you provide exact stack traces for those cases?
> > 
> > I wonder how process gets to D state with commit fabf7f29b3e2
> > ("fanotify: Use interruptible wait when waiting for permission events")
> 
> So D state is expected when waiting for response. We are using
> TASK_UNINTERRUPTIBLE sleep (the above commit had to be effectively
> reverted). But we are also setting TASK_KILLABLE and TASK_FREEZABLE so that
> we don't block hibernation and tasks can be killed when fanotify listener
> misbehaves.
> 
> But what confuses me is the following: You have fanotify instance to which
> you've got fd from fanotify_init(). For any process to be hanging, this fd
> must be still held open by some process. Otherwise the fanotify instance
> gets destroyed and all processes are free to run (they get FAN_ALLOW reply
> if they were already waiting). So the fact that you see processes hanging
> when your fanotify listener crashes means that you have likely leaked the
> fd to some other process (lsof should be able to tell you which process has
> still handle to fanotify instance). And the kernel has no way to know this
> is not the process that will eventually read these events and reply...
> 
> > > In this case, any events that have been read but not yet responded to would be lost.
> > > Initially we considered handling this internally by saving the file descriptors for pending events,
> > > however this proved to be complex to do in a robust manner.
> > >
> > > A more robust solution is to add a kernel fanotify api which resets the fanotify pending event queue,
> > > thereby allowing us to recover pending events in the case of daemon restart.
> > > A strawman implementation of this approach is in
> > > https://github.com/torvalds/linux/compare/master...ibrahim-jirdeh:linux:fanotify-reset-pending,
> > > a new ioctl that resets `group->fanotify_data.access_list`.
> > > One other alternative we considered is directly exposing the pending event queue itself
> > > (https://github.com/torvalds/linux/commit/cd90ff006fa2732d28ff6bb5975ca5351ce009f1)
> > > to support monitoring pending events, but simply resetting the queue is likely sufficient for our use-case.
> > >
> > > What do you think of exposing this functionality in fanotify?
> > >
> > 
> > Ignoring the pending events for start, how do you deal with access to
> > non-populated files while the daemon is down?
> > 
> > We were throwing some idea about having a mount option (something
> > like a "moderate" mount) to determine the default response for specific
> > permission events (e.g. FAN_OPEN_PERM) in the case that there is
> > no listener watching this event.
> > 
> > If you have a filesystem which may contain non-populated files, you
> > mount it with as "moderated" mount and then access to all files is
> > denied until the daemon is running and also denied if daemon is down.
> > 
> > For restart, it might make sense to start a new daemon to start listening
> > to events before stopping the old daemon.
> > If the new daemon gets the events before the old daemon, things should
> > be able to transition smoothly.
> 
> I agree this would be a sensible protocol for updates. For unplanned crashes
> I agree we need something like the "moderated" mount option.

I hope you mean as a superblock option, not an actual mount option.
Because a per-mount option is not something I want us to do. We're not
giving a specific subsystem a way to alter per-mount behavior.




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux