On Mon, Jul 14, 2025 at 9:02 AM Ibrahim Jirdeh <ibrahimjirdeh@xxxxxxxx> wrote: > > On 7/12/25, 1:08 AM, "Amir Goldstein" <amir73il@xxxxxxxxx <mailto:amir73il@xxxxxxxxx>> wrote: > > > Regarding the ioctl, it occured to me that it may be a foot gun. > > Once the new group interrupts all the in-flight events, > > if due to a userland bug, this is done without full collaboration > > with old group, there could be nasty races of both old and new > > groups responding to the same event, and with recyclable > > ida response ids that could cause a real mess. > > Makes sense. I had only considered an "ideal" usage where the resend > ioctl is synchronized. Sounds reasonable to provide stronger guarantees > within the surfaced api. > > > If we implement the control-fd/queue-fd design, we would > > not have this problem. > > The ioctl to open an event-queue-fd would fail it a queue > > handler fd is already open. > > I had a few questions around the control-fd/queue-fd api you outlined. > Most basically, in the new design, do we now only allow reading events / > writing responses through the issued queue-fd. > Correct. The fanotify control fd is what keeps the group object alive and it is used for fanotify_mark() and for the ioctl that generated the queue-fd. The queue-fd is for fanotify_read (events) and fanotify_write (responses). > > The control fd API means that when a *queue* fd is released, > > events remain in pending state until a new queue fd is opened > > and can also imply the retry unanswered behavior, > > when the *control* fd is released. > > It may match what you are saying, but is it safe to simply trigger the > retry unanswered flow for pending events (events that are read but not > answered) on queue fd release. Yes you are right. This makes sense. I did not say this correctly. I wrote it more accurately. > And similarly the control fd release would > just match the current release flow of allowing / resending all queued > events + destroying group. Yes, that allows a handover without a fd store. - Start new group, setup marks, open queue fd, don't read from it - Stop old group - New group starts reading events (including the resent ones) > > And in terms of api usage does something like the following look > reasonable for the handover: > > - Control fd is still kept in fd store just like current setup > - Queue fd is not. This way on daemon restart/crash we will always resend > any pending events via the queue fd release > - On daemon startup always call ioctl to reissue a new queue fd > Yes. Exactly. sounds simple and natural. There may be complications, but I do not see them yet. > > Because I do not see an immediate use case for > > FAN_REPORT_RESPONSE_ID without handover, > > I would start by only allowing them together and consider relaxing > > later if such a use case is found. > > > > I will even consider taking this further and start with > > FAN_CLASS_PRE_CONTENT_FID requiring > > both the new feature flags. > > The feature dependence sounds reasonable to me. We will need both > FAN_REPORT_RESPONSE_ID and retry behavior + something like proposed > control fd api to robustly handle pending events. > > > Am I missing anything about meta use cases > > or the risks in the resend pending events ioctl? > > I don't think theres any other complications related to pending events in > our use case. And based on my understanding of the api you proposed, it > should address our case well. I can just briefly mention why its desirable > to have some mechanism to trigger resend while still using the same > group, I might have added this in a previous discussion. Apart from > interested (mounts of) directories, we are also adding ignore marks for > all populated files. So we would need to recreate this state if just > relying on retry behavior triggering on group close. Its doable on the > use case side but probably a bit tricky versus being able to continue > to use the existing group which has proper state. I needed no explanation - this was clear to me but maybe someone else did so good that you wrote it ;) But I think that besides the convenience of keeping the marks, it is not really doable to restart the service with guarantee that: - You won't lose any event - User will not be denied access between services Especially, if the old service instance could be hung and killed by a watchdog, IMO the restart mechanism is a must for a smooth and safe handover. Thanks, Amir.