Re: [PATCH] fanotify: support custom default close response

Jan Kara <jack@xxxxxxx> · Wed, 2 Jul 2025 18:15:49 +0200

On Mon 30-06-25 17:56:00, Amir Goldstein wrote:
> On Mon, Jun 30, 2025 at 5:36 PM Jan Kara <jack@xxxxxxx> wrote:
> > On Tue 24-06-25 08:30:03, Amir Goldstein wrote:
> > > On Mon, Jun 23, 2025 at 9:26 PM Ibrahim Jirdeh <ibrahimjirdeh@xxxxxxxx> wrote:
> > > >
> > > > Currently the default response for pending events is FAN_ALLOW.
> > > > This makes default close response configurable. The main goal
> > > > of these changes would be to provide better handling for pending
> > > > events for lazy file loading use cases which may back fanotify
> > > > events by a long-lived daemon. For earlier discussion see:
> > > > https://lore.kernel.org/linux-fsdevel/6za2mngeqslmqjg3icoubz37hbbxi6bi44canfsg2aajgkialt@c3ujlrjzkppr/
> > >
> > > These lore links are typically placed at the commit message tail block
> > > if related to a suggestion you would typically use:
> > >
> > > Suggested-by: Amir Goldstein <amir73il@xxxxxxxxx>
> > > Link: https://lore.kernel.org/linux-fsdevel/CAOQ4uxi6PvAcT1vL0d0e+7YjvkfU-kwFVVMAN-tc-FKXe1wtSg@xxxxxxxxxxxxxx/
> > > Signed-off-by: Ibrahim Jirdeh <ibrahimjirdeh@xxxxxxxx>
> > >
> > > This way reviewers whose response is "what a terrible idea!" can
> > > point their arrows at me instead of you ;)
> > >
> > > Note that this is a more accurate link to the message where the default
> > > response API was proposed, so readers won't need to sift through
> > > this long thread to find the reference.
> >
> > I've reread that thread to remember how this is supposed to be used. After
> > thinking about it now maybe we could just modify how pending fanotify
> > events behave in case of group destruction? Instead of setting FAN_ALLOW in
> > fanotify_release() we'd set a special event state that will make fanotify
> > group iteration code bubble up back to fsnotify() and restart the event
> > generation loop there?
> >
> > In the usual case this would behave the same way as setting FAN_ALLOW (just
> > in case of multiple permission event watchers some will get the event two
> > times which shouldn't matter). In case of careful design with fd store
> > etc., the daemon can setup the new notification group as needed and then
> > close the fd from the old notification group at which point it would
> > receive all the pending events in the new group. I can even see us adding
> > ioctl to the fanotify group which when called will result in the same
> > behavior (i.e., all pending permission events will get the "retry"
> > response). That way the new daemon could just take the old fd from the fd
> > store and call this ioctl to receive all the pending events again.
> >
> > No need for the new default response. We probably need to provide a group
> > feature flag for this so that userspace can safely query kernel support for
> > this behavior but otherwise even that wouldn't be really needed.
> >
> > What do you guys think?
> 
> With proper handover I am not sure why this is needed, because:
> - new group gets fd from store and signals old group
> - old group does not take any new event, completes in-flight events,
>   signals back new group and exists
> - new group starts processing events
> - so why do we need a complex mechanism in kernel to do what can easily
>   be done in usersapce?

This works for clean handover (say service update) - no need for any
mechanism (neither retry nor default response there). We are in agreement
here. If retry is supported, it will make the handover somewhat simpler for
userspace but that's not really substantial.

> Also, regardless I think that we need the default response, because:
> - groups starts, set default response to DENY and handsover fd to store
> - if group crashes unexpectedly, access to all files is denied, which is
>   exactly what we needed to do with the "moderated" mount
> - it is similar to access to FUSE mount when server is down

Yes, crashes are what I had in mind. With crashes you have nobody to
cleanly handle events still pending for the old group and you have to solve
it. Reporting FAN_DENY (through default response) is one way, what I
suggest with retry has the advantage that userspace doesn't have to deal
with spurious FAN_DENY errors in case of daemon crash. It is not a huge
benefit (crashes should better be rare ;)) but it is IMO a benefit.

Now regarding your comment about moderated mount: You are somewhat terse on
details so let me try to fill in. First let's differentiate between service
(daemon) and the notification group because they may have a different
lifetime. So the service starts, creates a notification group, places mark
on the sb with pre-content events. You didn't mention a mark in your
description but until the mark is set, the group receives no events so
there's nothing to respond to. Now if the service crashes there are two
options:

1) You didn't save your group fd anywhere. In that case the group and mark
is gone with the crash of the service, all accesses happening after this
moment are allowed => not good. Whether we have default response or not
doesn't really matter in this case for those few events that were possibly
pending. Agreed so far?

2) You've saved group fd to fd store. In this case the group is (from
kernel POV) fully alive even after the service crash and the default
response does not activate. The events will be queued to the group and
waiting for reply. No access to the fs is allowed to happen which is good.
Eventually the new service starts and we are in the situation I describe 3
paragraphs above about handling pending events.

So if we'd implement resending of pending events after group closure, I
don't see how default response (at least in its current form) would be
useful for anything.

Why I like the proposal of resending pending events:
a) No spurious FAN_DENY errors in case of service crash
b) No need for new concept (and API) for default response, just a feature
   flag.
c) With additional ioctl to trigger resending pending events without group
   closure, the newly started service can simply reuse the
   same notification group (even in case of old service crash) thus
   inheriting all placed marks (which is something Ibrahim would like to
   have).

Regarding complexity of the approach I propose the attached (untested)
patch should implement it and I don't think it is very complex logic...
So what do you think now? :)

								Honza
-- 
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR