On 6/24/25, 3:41 AM, "Amir Goldstein" <amir73il@xxxxxxxxx <mailto:amir73il@xxxxxxxxx>> wrote: > On Mon, Jun 23, 2025 at 9:36 PM Ibrahim Jirdeh <ibrahimjirdeh@xxxxxxxx> wrote: > > > > This adds support for responding to events via a unique event > > identifier. The main goal is to prevent races if there are multiple > > processes backing the same fanotify group (eg. handover of fanotify > > group to new instance of a backing daemon). A new event id field is > > added to fanotify metadata which is unique per group, and this behavior > > is guarded by FAN_ENABLE_EVENT_ID flag. > > FWIW, we also need this functionality for reporting pre-dir-content > events without an event->fd: > https://lore.kernel.org/linux-fsdevel/2dx3pbcnv5w75fxb2ghqtsk6gzl6cuxmd2rinzwbq7xxfjf5z7@3nqidi3mno46/ > > In theory, if you can manage with pre-content events that report fid > instead of open O_PATH fd, then we can add support for this mode > and tie the event->id solution with the FAN_NOFD case, but I am not sure > whether this would be too limiting for users. > > You will notice that in this patch review, I am adding more questions > than answers. > > The idea is to spark a design discussion and see if we can reach > consensus before you post v2. Thanks for the thoughtful feedback / discussion. I had a few questions for the high level comment about separating response id from event id, but will make sure to incorporate the additional suggestions eg. using s32 / idr depending on which approach we choose. > > > > > Some related discussion which this follows: > > https://lore.kernel.org/all/CAOQ4uxhuPBWD=TYZw974NsKFno-iNYSkHPw6WTfG_69ovS=nJA@xxxxxxxxxxxxxx/ > > > > Signed-off-by: Ibrahim Jirdeh <ibrahimjirdeh@xxxxxxxx> > > --- > > fs/notify/fanotify/fanotify.h | 1 + > > fs/notify/fanotify/fanotify_user.c | 37 +++++++++++++++++++++++------ > > include/linux/fanotify.h | 3 ++- > > include/linux/fsnotify_backend.h | 1 + > > include/uapi/linux/fanotify.h | 4 ++++ > > tools/include/uapi/linux/fanotify.h | 4 ++++ > > 6 files changed, 42 insertions(+), 8 deletions(-) > > > > diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h > > index b78308975082..383c28c3f977 100644 > > --- a/fs/notify/fanotify/fanotify.h > > +++ b/fs/notify/fanotify/fanotify.h > > @@ -442,6 +442,7 @@ struct fanotify_perm_event { > > u32 response; /* userspace answer to the event */ > > unsigned short state; /* state of the event */ > > int fd; /* fd we passed to userspace for this event */ > > + u64 event_id; /* unique event identifier for this event */ > > union { > > struct fanotify_response_info_header hdr; > > struct fanotify_response_info_audit_rule audit_rule; > > diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c > > index 02669abff4a5..c523c6283f1b 100644 > > --- a/fs/notify/fanotify/fanotify_user.c > > +++ b/fs/notify/fanotify/fanotify_user.c > > @@ -331,13 +331,15 @@ static int process_access_response(struct fsnotify_group *group, > > { > > struct fanotify_perm_event *event; > > int fd = response_struct->fd; > > + u64 event_id = response_struct->event_id; > > u32 response = response_struct->response; > > int errno = fanotify_get_response_errno(response); > > int ret = info_len; > > struct fanotify_response_info_audit_rule friar; > > > > - pr_debug("%s: group=%p fd=%d response=%x errno=%d buf=%p size=%zu\n", > > - __func__, group, fd, response, errno, info, info_len); > > + pr_debug( > > + "%s: group=%p fd=%d event_id=%lld response=%x errno=%d buf=%p size=%zu\n", > > + __func__, group, fd, event_id, response, errno, info, info_len); > > /* > > * make sure the response is valid, if invalid we do nothing and either > > * userspace can send a valid response or we will clean it up after the > > @@ -398,13 +400,18 @@ static int process_access_response(struct fsnotify_group *group, > > ret = 0; > > } > > > > - if (fd < 0) > > + u64 id = FAN_GROUP_FLAG(group, FAN_ENABLE_EVENT_ID) ? event_id : fd; > > + > > + if (id < 0) > > u64 cannot be negative. > I think we need to keep negative error values as invalid for better > backward compat > e.g. in case someone ends up writing FAN_NOFD in the response. > > Jan has suggested that we use an idr (could be cyclic) for event id and then > s32 is enough for a permission event/response id. > We could even start idr range at 256 and use response_struct->fd < -255 > range as non-fd event ids. > We skip the range -255..-1 to continue to support FAN_REPORT_FD_ERROR. > > This is convenient if we agree to overload event->fd and never need to > report both path fd and an event id. > > OTOH, I can envision other uses of a u64 event id, unrelated to permission > event response. > > I am considering extending fanotify API as a standard API to access fs > built-in persistent change journal, for fs that support it (e.g. lustre, ntfs). > In those filesystems, the persistent events have a u64 id, > so extending the fanotify API to describe the event with u64 id could be > useful down the road. > > But an event id and permission response id do not have to be the same id.. > > > > return -EINVAL; > > > > spin_lock(&group->notification_lock); > > list_for_each_entry(event, &group->fanotify_data.access_list, > > fae.fse.list) { > > - if (event->fd != fd) > > + u64 other_id = FAN_GROUP_FLAG(group, FAN_ENABLE_EVENT_ID) ? > > + event->event_id : > > + event->fd; > > + if (other_id != id) > > continue; > > > > list_del_init(&event->fae.fse.list); > > @@ -815,6 +822,15 @@ static ssize_t copy_event_to_user(struct fsnotify_group *group, > > else > > metadata.fd = fd >= 0 ? fd : FAN_NOFD; > > > > + /* > > + * Populate unique event id for group with FAN_ENABLE_EVENT_ID. > > + */ > > + if (FAN_GROUP_FLAG(group, FAN_ENABLE_EVENT_ID)) > > + metadata.event_id = > > + (u64)atomic64_inc_return(&group->event_id_counter); > > + else > > + metadata.event_id = -1; > > + > > if (pidfd_mode) { > > /* > > * Complain if the FAN_REPORT_PIDFD and FAN_REPORT_TID mutual > > @@ -865,8 +881,10 @@ static ssize_t copy_event_to_user(struct fsnotify_group *group, > > if (pidfd_file) > > fd_install(pidfd, pidfd_file); > > > > - if (fanotify_is_perm_event(event->mask)) > > + if (fanotify_is_perm_event(event->mask)) { > > FANOTIFY_PERM(event)->fd = fd; > > + FANOTIFY_PERM(event)->event_id = metadata.event_id; > > + } > > Need to fix all the uses of FAN_EVENT_METADATA_LEN macro to be made > conditional of FAN_REPORT_EVENT_ID, so we do not copy out this new field > without user opt-in. > > > > > return metadata.event_len; > > > > @@ -951,7 +969,11 @@ static ssize_t fanotify_read(struct file *file, char __user *buf, > > if (!fanotify_is_perm_event(event->mask)) { > > fsnotify_destroy_event(group, &event->fse); > > } else { > > - if (ret <= 0 || FANOTIFY_PERM(event)->fd < 0) { > > + u64 event_id = > > + FAN_GROUP_FLAG(group, FAN_ENABLE_EVENT_ID) ? > > + FANOTIFY_PERM(event)->fd : > > + FANOTIFY_PERM(event)->event_id; > > + if (ret <= 0 || event_id < 0) { > > spin_lock(&group->notification_lock); > > finish_permission_event(group, > > FANOTIFY_PERM(event), FAN_DENY, NULL); > > @@ -1649,6 +1671,7 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags) > > } > > > > group->default_response = FAN_ALLOW; > > + atomic64_set(&group->event_id_counter, 0); > > > > BUILD_BUG_ON(!(FANOTIFY_ADMIN_INIT_FLAGS & FAN_UNLIMITED_QUEUE)); > > if (flags & FAN_UNLIMITED_QUEUE) { > > @@ -2115,7 +2138,7 @@ static int __init fanotify_user_setup(void) > > FANOTIFY_DEFAULT_MAX_USER_MARKS); > > > > BUILD_BUG_ON(FANOTIFY_INIT_FLAGS & FANOTIFY_INTERNAL_GROUP_FLAGS); > > - BUILD_BUG_ON(HWEIGHT32(FANOTIFY_INIT_FLAGS) != 14); > > + BUILD_BUG_ON(HWEIGHT32(FANOTIFY_INIT_FLAGS) != 15); > > BUILD_BUG_ON(HWEIGHT32(FANOTIFY_MARK_FLAGS) != 11); > > > > fanotify_mark_cache = KMEM_CACHE(fanotify_mark, > > diff --git a/include/linux/fanotify.h b/include/linux/fanotify.h > > index 182fc574b848..08bdb7aac070 100644 > > --- a/include/linux/fanotify.h > > +++ b/include/linux/fanotify.h > > @@ -38,7 +38,8 @@ > > FAN_REPORT_PIDFD | \ > > FAN_REPORT_FD_ERROR | \ > > FAN_UNLIMITED_QUEUE | \ > > - FAN_UNLIMITED_MARKS) > > + FAN_UNLIMITED_MARKS | \ > > + FAN_ENABLE_EVENT_ID) > > > > /* > > * fanotify_init() flags that are allowed for user without CAP_SYS_ADMIN. > > diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h > > index 9683396acda6..58584a4e500a 100644 > > --- a/include/linux/fsnotify_backend.h > > +++ b/include/linux/fsnotify_backend.h > > @@ -232,6 +232,7 @@ struct fsnotify_group { > > enum fsnotify_group_prio priority; /* priority for sending events */ > > bool shutdown; /* group is being shut down, don't queue more events */ > > unsigned int default_response; /* default response sent on group close */ > > + atomic64_t event_id_counter; /* counter to generate unique event ids */ > > > > #define FSNOTIFY_GROUP_USER 0x01 /* user allocated group */ > > #define FSNOTIFY_GROUP_DUPS 0x02 /* allow multiple marks per object */ > > diff --git a/include/uapi/linux/fanotify.h b/include/uapi/linux/fanotify.h > > index 7badde273a66..e9fb8827fe1b 100644 > > --- a/include/uapi/linux/fanotify.h > > +++ b/include/uapi/linux/fanotify.h > > @@ -67,6 +67,8 @@ > > #define FAN_REPORT_TARGET_FID 0x00001000 /* Report dirent target id */ > > #define FAN_REPORT_FD_ERROR 0x00002000 /* event->fd can report error */ > > #define FAN_REPORT_MNT 0x00004000 /* Report mount events */ > > +/* Flag to populate and respond using unique event id */ > > +#define FAN_ENABLE_EVENT_ID 0x00008000 > > While it's true that the feature impacts more than the reported event > format, this is the main thing that it does, so please name it > FAN_REPORT_EVENT_ID > > > > > /* Convenience macro - FAN_REPORT_NAME requires FAN_REPORT_DIR_FID */ > > #define FAN_REPORT_DFID_NAME (FAN_REPORT_DIR_FID | FAN_REPORT_NAME) > > @@ -143,6 +145,7 @@ struct fanotify_event_metadata { > > __aligned_u64 mask; > > __s32 fd; > > __s32 pid; > > + __u64 event_id; > > }; > > Not sure if this change calls for bumping FANOTIFY_METADATA_VERSION > but I think we should not extend the reported metadata_len > unless the user opted-in with FAN_REPORT_EVENT_ID. > > If we do that we may break buggy programs that use the > FAN_EVENT_METADATA_LEN macro and it is an unneeded risk. > > We should probably deprecate the FAN_EVENT_METADATA_LEN > definition in uapi/fanotify.h altogether and re-write the FAN_EVENT_OK() macro. > > I know I asked for it, but extending fanotify_event_metadata does seem > to be a bit of a pain. > Do we prefer to scope this change to adding (s32) response id and not add new event id field yet. > Thinking out loud, if we use idr to allocate an event id, as Jan suggested, > and we do want to allow it along side event->fd, > then we could also overload event->pid, i.e. the meaning of > FAN_ERPORT_EVENT_ID would be "event->pid is an event id", > Similarly to the way that we overloaded event->pid with FAN_REPORT_TID. > Users that need both event id and pid can use FAN_REPORT_PIDFD. > At least for our usecase, having event->fd along with response id available would be helpful as we do not use fid mode mentioned above. Would it make sense to add response id as an extra information record (similar to pidfd) rather than overloading event->fd / event->pid? > > > > #define FAN_EVENT_INFO_TYPE_FID 1 > > @@ -226,6 +229,7 @@ struct fanotify_event_info_mnt { > > > > struct fanotify_response { > > __s32 fd; > > + __u64 event_id; > > __u32 response; > > }; > > > > Most likely we will end up with s32 response id, but if we do decide on > a separate u64 field, please move it to end for better struct alignment. > > Thanks, > Amir.