Re: Reseting pending fanotify events

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon 31-03-25 21:08:51, Amir Goldstein wrote:
> [CC Jan and Josef]

CCed fsdevel. Actually replying here because the quoting in Ibrahim's email
got somehow broken which made it very hard to understand.

> I am keeping this discussion private because you did not post it to
> the public list,
> but if you can CC fsdevel in your reply that would be great, because it seems
> like a question with interest to a wider audience.
> 
> On Mon, Mar 31, 2025 at 8:19 PM Ibrahim Jirdeh <ibrahimjirdeh@xxxxxxxx> wrote:
> >
> > Hi Amir,
> >
> > We have been using fanotify to support lazily loading file contents.
> > We are struggling with the problem that pending permission events cannot be recovered on daemon restart.
> >
> > We have a long-lived daemon that marks files with FAN_OPEN_PERM and populates their contents on access.
> > It needs a reliable path for updates & crash recovery.
> > The happy path for fanotify event processing is as follows:
> >
> > A notification is read from fanotify file descriptor
> > File contents are populated
> > We write back FAN_ALLOW to fanotify file descriptor, or DENY if content population failed.
> >
> > We would like to guarantee that all file accesses receive an ALLOW or DENY response, and no events are lost.
> 
> Makes sense.
> 
> > Unfortunately, today a filesystem client can hang (in D state)
> > if the event-handler daemon crashes or restarts at the wrong time.
> 
> Can you provide exact stack traces for those cases?
> 
> I wonder how process gets to D state with commit fabf7f29b3e2
> ("fanotify: Use interruptible wait when waiting for permission events")

So D state is expected when waiting for response. We are using
TASK_UNINTERRUPTIBLE sleep (the above commit had to be effectively
reverted). But we are also setting TASK_KILLABLE and TASK_FREEZABLE so that
we don't block hibernation and tasks can be killed when fanotify listener
misbehaves.

But what confuses me is the following: You have fanotify instance to which
you've got fd from fanotify_init(). For any process to be hanging, this fd
must be still held open by some process. Otherwise the fanotify instance
gets destroyed and all processes are free to run (they get FAN_ALLOW reply
if they were already waiting). So the fact that you see processes hanging
when your fanotify listener crashes means that you have likely leaked the
fd to some other process (lsof should be able to tell you which process has
still handle to fanotify instance). And the kernel has no way to know this
is not the process that will eventually read these events and reply...

> > In this case, any events that have been read but not yet responded to would be lost.
> > Initially we considered handling this internally by saving the file descriptors for pending events,
> > however this proved to be complex to do in a robust manner.
> >
> > A more robust solution is to add a kernel fanotify api which resets the fanotify pending event queue,
> > thereby allowing us to recover pending events in the case of daemon restart.
> > A strawman implementation of this approach is in
> > https://github.com/torvalds/linux/compare/master...ibrahim-jirdeh:linux:fanotify-reset-pending,
> > a new ioctl that resets `group->fanotify_data.access_list`.
> > One other alternative we considered is directly exposing the pending event queue itself
> > (https://github.com/torvalds/linux/commit/cd90ff006fa2732d28ff6bb5975ca5351ce009f1)
> > to support monitoring pending events, but simply resetting the queue is likely sufficient for our use-case.
> >
> > What do you think of exposing this functionality in fanotify?
> >
> 
> Ignoring the pending events for start, how do you deal with access to
> non-populated files while the daemon is down?
> 
> We were throwing some idea about having a mount option (something
> like a "moderate" mount) to determine the default response for specific
> permission events (e.g. FAN_OPEN_PERM) in the case that there is
> no listener watching this event.
> 
> If you have a filesystem which may contain non-populated files, you
> mount it with as "moderated" mount and then access to all files is
> denied until the daemon is running and also denied if daemon is down.
> 
> For restart, it might make sense to start a new daemon to start listening
> to events before stopping the old daemon.
> If the new daemon gets the events before the old daemon, things should
> be able to transition smoothly.

I agree this would be a sensible protocol for updates. For unplanned crashes
I agree we need something like the "moderated" mount option.

> Of course, if an old daemon can shutdown and leave processes in
> uninterruptible sleep, that is a critical bug that needs to be fixed
> regardless of the handover problem.

If the fanotify fd got closed and the instance shutdown, this would be
indeed a serious bug (likely UAF issue). But so far I rather suspect the fd
is just in some fd table somewhere...

								Honza
-- 
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux