Re: [PATCH] fanotify: support custom default close response

Amir Goldstein <amir73il@xxxxxxxxx> · Mon, 14 Jul 2025 17:59:22 +0200

On Mon, Jul 14, 2025 at 9:02 AM Ibrahim Jirdeh <ibrahimjirdeh@xxxxxxxx> wrote:
>
> On 7/12/25, 1:08 AM, "Amir Goldstein" <amir73il@xxxxxxxxx <mailto:amir73il@xxxxxxxxx>> wrote:
>
> > Regarding the ioctl, it occured to me that it may be a foot gun.
> > Once the new group interrupts all the in-flight events,
> > if due to a userland bug, this is done without full collaboration
> > with old group, there could be nasty races of both old and new
> > groups responding to the same event, and with recyclable
> > ida response ids that could cause a real mess.
>
> Makes sense. I had only considered an "ideal" usage where the resend
> ioctl is synchronized. Sounds reasonable to provide stronger guarantees
> within the surfaced api.
>
> > If we implement the control-fd/queue-fd design, we would
> > not have this problem.
> > The ioctl to open an event-queue-fd would fail it a queue
> > handler fd is already open.
>
> I had a few questions around the control-fd/queue-fd api you outlined.
> Most basically, in the new design, do we now only allow reading events /
> writing responses through the issued queue-fd.
>

Correct.

The fanotify control fd is what keeps the group object alive
and it is used for fanotify_mark() and for the ioctl that generated
the queue-fd.

The queue-fd is for fanotify_read (events) and fanotify_write
(responses).

> > The control fd API means that when a *queue* fd is released,
> > events remain in pending state until a new queue fd is opened
> > and can also imply the retry unanswered behavior,
> > when the *control* fd is released.
>
> It may match what you are saying, but is it safe to simply trigger the
> retry unanswered flow for pending events (events that are read but not
> answered) on queue fd release.

Yes you are right. This makes sense.
I did not say this correctly. I wrote it more accurately.

> And similarly the control fd release would
> just match the current release flow of allowing / resending all queued
> events + destroying group.

Yes, that allows a handover without a fd store.
- Start new group, setup marks, open queue fd, don't read from it
- Stop old group
- New group starts reading events (including the resent ones)

>
> And in terms of api usage does something like the following look
> reasonable for the handover:
>
> - Control fd is still kept in fd store just like current setup
> - Queue fd is not. This way on daemon restart/crash we will always resend
> any pending events via the queue fd release
> - On daemon startup always call ioctl to reissue a new queue fd
>

Yes. Exactly. sounds simple and natural.
There may be complications, but I do not see them yet.

> > Because I do not see an immediate use case for
> > FAN_REPORT_RESPONSE_ID without handover,
> > I would start by only allowing them together and consider relaxing
> > later if such a use case is found.
> >
> > I will even consider taking this further and start with
> > FAN_CLASS_PRE_CONTENT_FID requiring
> > both the new feature flags.
>
> The feature dependence sounds reasonable to me. We will need both
> FAN_REPORT_RESPONSE_ID and retry behavior + something like proposed
> control fd api to robustly handle pending events.
>
> > Am I missing anything about meta use cases
> > or the risks in the resend pending events ioctl?
>
> I don't think theres any other complications related to pending events in
> our use case. And based on my understanding of the api you proposed, it
> should address our case well. I can just briefly mention why its desirable
> to have some mechanism to trigger resend while still using the same
> group, I might have added this in a previous discussion. Apart from
> interested (mounts of) directories, we are also adding ignore marks for
> all populated files. So we would need to recreate this state if just
> relying on retry behavior triggering on group close. Its doable on the
> use case side but probably a bit tricky versus being able to continue
> to use the existing group which has proper state.

I needed no explanation - this was clear to me but maybe
someone else did so good that you wrote it ;)

But I think that besides the convenience of keeping the marks,
it is not really doable to restart the service with guarantee that:
- You won't lose any event
- User will not be denied access between services

Especially, if the old service instance could be hung and killed by a watchdog,
IMO the restart mechanism is a must for a smooth and safe handover.

Thanks,
Amir.