Re: [PATCH] fanotify: introduce unique event identifier

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jun 23, 2025 at 9:36 PM Ibrahim Jirdeh <ibrahimjirdeh@xxxxxxxx> wrote:
>
> This adds support for responding to events via a unique event
> identifier. The main goal is to prevent races if there are multiple
> processes backing the same fanotify group (eg. handover of fanotify
> group to new instance of a backing daemon). A new event id field is
> added to fanotify metadata which is unique per group, and this behavior
> is guarded by FAN_ENABLE_EVENT_ID flag.

FWIW, we also need this functionality for reporting pre-dir-content
events without an event->fd:
https://lore.kernel.org/linux-fsdevel/2dx3pbcnv5w75fxb2ghqtsk6gzl6cuxmd2rinzwbq7xxfjf5z7@3nqidi3mno46/

In theory, if you can manage with pre-content events that report fid
instead of open O_PATH fd, then we can add support for this mode
and tie the event->id solution with the FAN_NOFD case, but I am not sure
whether this would be too limiting for users.

You will notice that in this patch review, I am adding more questions
than answers.

The idea is to spark a design discussion and see if we can reach
consensus before you post v2.

>
> Some related discussion which this follows:
> https://lore.kernel.org/all/CAOQ4uxhuPBWD=TYZw974NsKFno-iNYSkHPw6WTfG_69ovS=nJA@xxxxxxxxxxxxxx/
>
> Signed-off-by: Ibrahim Jirdeh <ibrahimjirdeh@xxxxxxxx>
> ---
>  fs/notify/fanotify/fanotify.h       |  1 +
>  fs/notify/fanotify/fanotify_user.c  | 37 +++++++++++++++++++++++------
>  include/linux/fanotify.h            |  3 ++-
>  include/linux/fsnotify_backend.h    |  1 +
>  include/uapi/linux/fanotify.h       |  4 ++++
>  tools/include/uapi/linux/fanotify.h |  4 ++++
>  6 files changed, 42 insertions(+), 8 deletions(-)
>
> diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h
> index b78308975082..383c28c3f977 100644
> --- a/fs/notify/fanotify/fanotify.h
> +++ b/fs/notify/fanotify/fanotify.h
> @@ -442,6 +442,7 @@ struct fanotify_perm_event {
>         u32 response;                   /* userspace answer to the event */
>         unsigned short state;           /* state of the event */
>         int fd;         /* fd we passed to userspace for this event */
> +       u64 event_id;           /* unique event identifier for this event */
>         union {
>                 struct fanotify_response_info_header hdr;
>                 struct fanotify_response_info_audit_rule audit_rule;
> diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
> index 02669abff4a5..c523c6283f1b 100644
> --- a/fs/notify/fanotify/fanotify_user.c
> +++ b/fs/notify/fanotify/fanotify_user.c
> @@ -331,13 +331,15 @@ static int process_access_response(struct fsnotify_group *group,
>  {
>         struct fanotify_perm_event *event;
>         int fd = response_struct->fd;
> +       u64 event_id = response_struct->event_id;
>         u32 response = response_struct->response;
>         int errno = fanotify_get_response_errno(response);
>         int ret = info_len;
>         struct fanotify_response_info_audit_rule friar;
>
> -       pr_debug("%s: group=%p fd=%d response=%x errno=%d buf=%p size=%zu\n",
> -                __func__, group, fd, response, errno, info, info_len);
> +       pr_debug(
> +               "%s: group=%p fd=%d event_id=%lld response=%x errno=%d buf=%p size=%zu\n",
> +               __func__, group, fd, event_id, response, errno, info, info_len);
>         /*
>          * make sure the response is valid, if invalid we do nothing and either
>          * userspace can send a valid response or we will clean it up after the
> @@ -398,13 +400,18 @@ static int process_access_response(struct fsnotify_group *group,
>                 ret = 0;
>         }
>
> -       if (fd < 0)
> +       u64 id = FAN_GROUP_FLAG(group, FAN_ENABLE_EVENT_ID) ? event_id : fd;
> +
> +       if (id < 0)

u64 cannot be negative.
I think we need to keep negative error values as invalid for better
backward compat
e.g. in case someone ends up writing FAN_NOFD in the response.

Jan has suggested that we use an idr (could be cyclic) for event id and then
s32 is enough for a permission event/response id.
We could even start idr range at 256 and use response_struct->fd < -255
range as non-fd event ids.
We skip the range -255..-1 to continue to support FAN_REPORT_FD_ERROR.

This is convenient if we agree to overload event->fd and never need to
report both path fd and an event id.

OTOH, I can envision other uses of a u64 event id, unrelated to permission
event response.

I am considering extending fanotify API as a standard API to access fs
built-in persistent change journal, for fs that support it (e.g. lustre, ntfs).
In those filesystems, the persistent events have a u64 id,
so extending the fanotify API to describe the event with u64 id could be
useful down the road.

But an event id and permission response id do not have to be the same id..


>                 return -EINVAL;
>
>         spin_lock(&group->notification_lock);
>         list_for_each_entry(event, &group->fanotify_data.access_list,
>                             fae.fse.list) {
> -               if (event->fd != fd)
> +               u64 other_id = FAN_GROUP_FLAG(group, FAN_ENABLE_EVENT_ID) ?
> +                                      event->event_id :
> +                                      event->fd;
> +               if (other_id != id)
>                         continue;
>
>                 list_del_init(&event->fae.fse.list);
> @@ -815,6 +822,15 @@ static ssize_t copy_event_to_user(struct fsnotify_group *group,
>         else
>                 metadata.fd = fd >= 0 ? fd : FAN_NOFD;
>
> +       /*
> +        * Populate unique event id for group with FAN_ENABLE_EVENT_ID.
> +        */
> +       if (FAN_GROUP_FLAG(group, FAN_ENABLE_EVENT_ID))
> +               metadata.event_id =
> +                       (u64)atomic64_inc_return(&group->event_id_counter);
> +       else
> +               metadata.event_id = -1;
> +
>         if (pidfd_mode) {
>                 /*
>                  * Complain if the FAN_REPORT_PIDFD and FAN_REPORT_TID mutual
> @@ -865,8 +881,10 @@ static ssize_t copy_event_to_user(struct fsnotify_group *group,
>         if (pidfd_file)
>                 fd_install(pidfd, pidfd_file);
>
> -       if (fanotify_is_perm_event(event->mask))
> +       if (fanotify_is_perm_event(event->mask)) {
>                 FANOTIFY_PERM(event)->fd = fd;
> +               FANOTIFY_PERM(event)->event_id = metadata.event_id;
> +       }

Need to fix all the uses of FAN_EVENT_METADATA_LEN macro to be made
conditional of FAN_REPORT_EVENT_ID, so we do not copy out this new field
without user opt-in.

>
>         return metadata.event_len;
>
> @@ -951,7 +969,11 @@ static ssize_t fanotify_read(struct file *file, char __user *buf,
>                 if (!fanotify_is_perm_event(event->mask)) {
>                         fsnotify_destroy_event(group, &event->fse);
>                 } else {
> -                       if (ret <= 0 || FANOTIFY_PERM(event)->fd < 0) {
> +                       u64 event_id =
> +                               FAN_GROUP_FLAG(group, FAN_ENABLE_EVENT_ID) ?
> +                                       FANOTIFY_PERM(event)->fd :
> +                                       FANOTIFY_PERM(event)->event_id;
> +                       if (ret <= 0 || event_id < 0) {
>                                 spin_lock(&group->notification_lock);
>                                 finish_permission_event(group,
>                                         FANOTIFY_PERM(event), FAN_DENY, NULL);
> @@ -1649,6 +1671,7 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags)
>         }
>
>         group->default_response = FAN_ALLOW;
> +       atomic64_set(&group->event_id_counter, 0);
>
>         BUILD_BUG_ON(!(FANOTIFY_ADMIN_INIT_FLAGS & FAN_UNLIMITED_QUEUE));
>         if (flags & FAN_UNLIMITED_QUEUE) {
> @@ -2115,7 +2138,7 @@ static int __init fanotify_user_setup(void)
>                                      FANOTIFY_DEFAULT_MAX_USER_MARKS);
>
>         BUILD_BUG_ON(FANOTIFY_INIT_FLAGS & FANOTIFY_INTERNAL_GROUP_FLAGS);
> -       BUILD_BUG_ON(HWEIGHT32(FANOTIFY_INIT_FLAGS) != 14);
> +       BUILD_BUG_ON(HWEIGHT32(FANOTIFY_INIT_FLAGS) != 15);
>         BUILD_BUG_ON(HWEIGHT32(FANOTIFY_MARK_FLAGS) != 11);
>
>         fanotify_mark_cache = KMEM_CACHE(fanotify_mark,
> diff --git a/include/linux/fanotify.h b/include/linux/fanotify.h
> index 182fc574b848..08bdb7aac070 100644
> --- a/include/linux/fanotify.h
> +++ b/include/linux/fanotify.h
> @@ -38,7 +38,8 @@
>                                          FAN_REPORT_PIDFD | \
>                                          FAN_REPORT_FD_ERROR | \
>                                          FAN_UNLIMITED_QUEUE | \
> -                                        FAN_UNLIMITED_MARKS)
> +                                        FAN_UNLIMITED_MARKS | \
> +                                        FAN_ENABLE_EVENT_ID)
>
>  /*
>   * fanotify_init() flags that are allowed for user without CAP_SYS_ADMIN.
> diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
> index 9683396acda6..58584a4e500a 100644
> --- a/include/linux/fsnotify_backend.h
> +++ b/include/linux/fsnotify_backend.h
> @@ -232,6 +232,7 @@ struct fsnotify_group {
>         enum fsnotify_group_prio priority;      /* priority for sending events */
>         bool shutdown;          /* group is being shut down, don't queue more events */
>         unsigned int default_response; /* default response sent on group close */
> +       atomic64_t event_id_counter; /* counter to generate unique event ids */
>
>  #define FSNOTIFY_GROUP_USER    0x01 /* user allocated group */
>  #define FSNOTIFY_GROUP_DUPS    0x02 /* allow multiple marks per object */
> diff --git a/include/uapi/linux/fanotify.h b/include/uapi/linux/fanotify.h
> index 7badde273a66..e9fb8827fe1b 100644
> --- a/include/uapi/linux/fanotify.h
> +++ b/include/uapi/linux/fanotify.h
> @@ -67,6 +67,8 @@
>  #define FAN_REPORT_TARGET_FID  0x00001000      /* Report dirent target id  */
>  #define FAN_REPORT_FD_ERROR    0x00002000      /* event->fd can report error */
>  #define FAN_REPORT_MNT         0x00004000      /* Report mount events */
> +/* Flag to populate and respond using unique event id */
> +#define FAN_ENABLE_EVENT_ID            0x00008000

While it's true that the feature impacts more than the reported event
format, this is the main thing that it does, so please name it
FAN_REPORT_EVENT_ID

>
>  /* Convenience macro - FAN_REPORT_NAME requires FAN_REPORT_DIR_FID */
>  #define FAN_REPORT_DFID_NAME   (FAN_REPORT_DIR_FID | FAN_REPORT_NAME)
> @@ -143,6 +145,7 @@ struct fanotify_event_metadata {
>         __aligned_u64 mask;
>         __s32 fd;
>         __s32 pid;
> +       __u64 event_id;
>  };

Not sure if this change calls for bumping FANOTIFY_METADATA_VERSION
but I think we should not extend the reported metadata_len
unless the user opted-in with FAN_REPORT_EVENT_ID.

If we do that we may break buggy programs that use the
FAN_EVENT_METADATA_LEN macro and it is an unneeded risk.

We should probably deprecate the FAN_EVENT_METADATA_LEN
definition in uapi/fanotify.h altogether and re-write the FAN_EVENT_OK() macro.

I know I asked for it, but extending fanotify_event_metadata does seem
to be a bit of a pain.

Thinking out loud, if we use idr to allocate an event id, as Jan suggested,
and we do want to allow it along side event->fd,
then we could also overload event->pid, i.e. the meaning of
FAN_ERPORT_EVENT_ID would be "event->pid is an event id",
Similarly to the way that we overloaded event->pid with FAN_REPORT_TID.
Users that need both event id and pid can use FAN_REPORT_PIDFD.

>
>  #define FAN_EVENT_INFO_TYPE_FID                1
> @@ -226,6 +229,7 @@ struct fanotify_event_info_mnt {
>
>  struct fanotify_response {
>         __s32 fd;
> +       __u64 event_id;
>         __u32 response;
>  };
>

Most likely we will end up with s32 response id, but if we do decide on
a separate u64 field, please move it to end for better struct alignment.

Thanks,
Amir.





[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux