On Mon, Jul 14, 2025 at 7:25 PM Jan Kara <jack@xxxxxxx> wrote: > > On Sat 12-07-25 10:08:25, Amir Goldstein wrote: > > On Sat, Jul 12, 2025 at 12:37 AM Ibrahim Jirdeh <ibrahimjirdeh@xxxxxxxx> wrote: > > > > > > > On Thu, Jul 3, 2025 at 4:43 PM Jan Kara <jack@xxxxxxx> wrote: > > > > > > > > > > On Thu 03-07-25 10:27:17, Amir Goldstein wrote: > > > > > > On Thu, Jul 3, 2025 at 9:10 AM Ibrahim Jirdeh <ibrahimjirdeh@xxxxxxxx> wrote: > > > > > > > > > > > > > > > On Wed, Jul 2, 2025 at 6:15 PM Jan Kara <jack@xxxxxxx> wrote: > > > > > > > > > Eventually the new service starts and we are in the situation I describe 3 > > > > > > > > > paragraphs above about handling pending events. > > > > > > > > > > > > > > > > > > So if we'd implement resending of pending events after group closure, I > > > > > > > > > don't see how default response (at least in its current form) would be > > > > > > > > > useful for anything. > > > > > > > > > > > > > > > > > > Why I like the proposal of resending pending events: > > > > > > > > > a) No spurious FAN_DENY errors in case of service crash > > > > > > > > > b) No need for new concept (and API) for default response, just a feature > > > > > > > > > flag. > > > > > > > > > c) With additional ioctl to trigger resending pending events without group > > > > > > > > > closure, the newly started service can simply reuse the > > > > > > > > > same notification group (even in case of old service crash) thus > > > > > > > > > inheriting all placed marks (which is something Ibrahim would like to > > > > > > > > > have). > > > > > > > > > > > > > > > > > > > > > > I'm also a fan of the approach of support for resending pending events. As > > > > > > > mentioned exposing this behavior as an ioctl and thereby removing the need to > > > > > > > recreate fanotify group makes the usage a fair bit simpler for our case. > > > > > > > > > > > > > > One basic question I have (mainly for understanding), is if the FAN_RETRY flag is > > > > > > > set in the proposed patch, in the case where there is one existing group being > > > > > > > closed (ie no handover setup), what would be the behavior for pending events? > > > > > > > Is it the same as now, events are allowed, just that they get resent once? > > > > > > > > > > > > Yes, same as now. > > > > > > Instead of replying FAN_ALLOW, syscall is being restarted > > > > > > to check if a new watcher was added since this watcher took the event. > > > > > > > > > > Yes, just it isn't the whole syscall that's restarted but only the > > > > > fsnotify() call. > > > > > > I was trying out the resend patch Jan posted in this thread along with a > > > simple ioctl to trigger the resend flow - it worked well, any remaining > > > concerns with exposing this functionality? If not I could go ahead and > > > pull in Jan's change and post it with additional ioctl. > > > > I do not have any concern about the retry behavior itself, > > but about the ioctl, feature dependency and test matrix. > > > > Regarding the ioctl, it occured to me that it may be a foot gun. > > Once the new group interrupts all the in-flight events, > > if due to a userland bug, this is done without full collaboration > > with old group, there could be nasty races of both old and new > > groups responding to the same event, and with recyclable > > ida response ids that could cause a real mess. > > > > Of course you can say it is userspace blame, but the smooth > > handover from old service to new service instance is not always > > easy to get right, hence, a foot gun. > > > > If we implement the control-fd/queue-fd design, we would > > not have this problem. > > The ioctl to open an event-queue-fd would fail it a queue > > handler fd is already open. > > IOW, the handover is kernel controlled and much safer. > > For the sake of discussion let's call this feature > > FAN_CONTROL_FD and let it allow the ioctl > > IOC_FAN_OPEN_QUEUE_FD. > > I agree this is probably a safer variant. > > > The simpler functionality of FAN_RETRY_UNANSWERED > > may be useful regardless of the handover ioctl (?), but if we > > agree that the correct design for handover is the control fd design, > > and this design will require a feature flag anyway, > > then I don't think that we need two feature flags. > > > > If users want the retry unanswered functionality, they can use the > > new API for control fd, regardless of whether they intend to store > > the fd and do handover or not. > > > > The control fd API means that when a *queue* fd is released, > > events remain in pending state until a new queue fd is opened > > and can also imply the retry unanswered behavior, > > when the *control* fd is released. > > Right, given with queue-fd design we actually have clear "successor" fd > where to report already reported but not answered events. So we can just > move back unanswered events on close of old queue fd so they'd be reported > again on read from the new queue fd. So there will be no need to bother > other notification groups with resent events with this design. > Yes, we'd need to be careful with this new event state transition to make sure that we "recycle" the event properly. > > I don't think there is much to lose from this retry behavior. > > The only reason we want to opt-in for it is to avoid surprises of > > behavior change in existing deployments. > > > > While we could have FAN_RETRY_UNANSWERED as an > > independent feature without a handover ioctl, > > In order to avoid test matrix bloat, at least for a start (we can relax later), > > I don't think that we should allow it as an independent feature > > and especially not for legacy modes (i.e. for Anti-Virus) unless there > > is a concrete user requesting/testing these use cases. > > With queue-fd design I agree there's no reason not to have the "resend > pending events" behavior from the start. > > > Going on about feature dependency. > > > > Practically, a handover ioctl is useless without > > FAN_REPORT_RESPONSE_ID, so for sure we need to require > > FAN_REPORT_RESPONSE_ID for the handover ioctl feature. > > > > Because I do not see an immediate use case for > > FAN_REPORT_RESPONSE_ID without handover, > > I would start by only allowing them together and consider relaxing > > later if such a use case is found. > > We can tie them together but I think queue-fd design doesn't require > FAN_REPORT_RESPONSE_ID. Since we resend events anyway, we'd generate new > fds for events as well and things would just work AFAICT. > Right. hmm. I'd still like to consider the opportunity of the new-ish API for deprecating some old legacy API baggage. For example: do you consider the fact that async events are mixed together with permission/pre-content events in the same queue an historic mistake or a feature? I'm tempted, as we split the control fd from the queue fd to limit a queue to events of the same "type". Maybe an O_RDONLY async queue fd or an O_RDWR permission events queue fd or only allow the latter with the new API? I am not sure what the best semantics would be and I'd hate to send Ibrahim on another walk into the woods with this added requirement. WDYT? Is it worth adding some limitations that may be relaxed later? which? I'd be ok with limiting to permission events only for a start. Ibrahim, Please note that FAN_CLOEXEC and FAN_NONBLOCK currently apply to the control fd. I guess they can also apply to the queue fd, Anyway the control fd should not be O_RDWR probably O_RDONLY even though it's not for reading. Thanks, Amir.