Re: [PATCH v2 2/2] fanotify: support watching filesystems and mounts inside userns

Amir Goldstein <amir73il@xxxxxxxxx> · Fri, 16 May 2025 19:28:58 +0200




On Wed, May 14, 2025 at 8:39 PM Amir Goldstein <amir73il@xxxxxxxxx> wrote:
>
> On Wed, May 14, 2025 at 5:49 PM Jan Kara <jack@xxxxxxx> wrote:
> >
> > On Sat 19-04-25 12:06:57, Amir Goldstein wrote:
> > > An unprivileged user is allowed to create an fanotify group and add
> > > inode marks, but not filesystem, mntns and mount marks.
> > >
> > > Add limited support for setting up filesystem, mntns and mount marks by
> > > an unprivileged user under the following conditions:
> > >
> > > 1.   User has CAP_SYS_ADMIN in the user ns where the group was created
> > > 2.a. User has CAP_SYS_ADMIN in the user ns where the filesystem was
> > >      mounted (implies FS_USERNS_MOUNT)
> > >   OR (in case setting up a mntns or mount mark)
> > > 2.b. User has CAP_SYS_ADMIN in the user ns associated with the mntns
> > >
> > > Signed-off-by: Amir Goldstein <amir73il@xxxxxxxxx>
> >
> > I'm sorry for the delay. Both patches look good to me but I'd like somebody
> > more versed with user namespaces to double check namespace checks in this
> > patch are indeed sound (so that we don't introduce some security issue).
>
> Good idea!
>
> > Christian, can you have a look please?
> >
>
> Christian,
>
> Please note that the checks below are loosely modeled after the tests in
> may_decode_fh(), with some differences:
>
> > >
> > >       /*
> > > -      * An unprivileged user is not allowed to setup mount nor filesystem
> > > -      * marks.  This also includes setting up such marks by a group that
> > > -      * was initialized by an unprivileged user.
> > > +      * A user is allowed to setup sb/mount/mntns marks only if it is
> > > +      * capable in the user ns where the group was created.
> > >        */
> > > -     if ((!capable(CAP_SYS_ADMIN) ||
> > > -          FAN_GROUP_FLAG(group, FANOTIFY_UNPRIV)) &&
> > > +     if (!ns_capable(group->user_ns, CAP_SYS_ADMIN) &&
> > >           mark_type != FAN_MARK_INODE)
> > >               return -EPERM;
> > >
>
> 1. This is an extra restriction. Not sure is need to remain forever,
> but it reduces
>     attack surface and does not limit the common use cases,
>     so I think it's worth to have.
>
> > > @@ -1987,12 +1988,27 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask,
> > >               obj = inode;
> > >       } else if (obj_type == FSNOTIFY_OBJ_TYPE_VFSMOUNT) {
> > >               obj = path.mnt;
> > > +             user_ns = real_mount(obj)->mnt_ns->user_ns;
> > >       } else if (obj_type == FSNOTIFY_OBJ_TYPE_SB) {
> > >               obj = path.mnt->mnt_sb;
> > > +             user_ns = path.mnt->mnt_sb->s_user_ns;
> > >       } else if (obj_type == FSNOTIFY_OBJ_TYPE_MNTNS) {
> > >               obj = mnt_ns_from_dentry(path.dentry);
> > > +             user_ns = ((struct mnt_namespace *)obj)->user_ns;
> > >       }
> > >
> > > +     /*
> > > +      * In addition to being capable in the user ns where group was created,
> > > +      * the user also needs to be capable in the user ns associated with
> > > +      * the marked filesystem (for FS_USERNS_MOUNT filesystems) or in the
> > > +      * user ns associated with the mntns (when marking a mount or mntns).
> > > +      * This is aligned with the required permissions to open_by_handle_at()
> > > +      * a directory fid provided with the events.
> > > +      */
> > > +     ret = -EPERM;
> > > +     if (user_ns && !ns_capable(user_ns, CAP_SYS_ADMIN))
> > > +             goto path_put_and_out;
> > > +
>
> 2. In may_decode_fh() we know the mount that resulting file will be
>     opened from so we accept
>     Either capable mnt->mnt_sb->s_user_ns
>     OR capable real_mount(mnt)->mnt_ns->user_ns
> whereas here we only check capable mnt->mnt_sb->s_user_ns
>    when subscribing to fs events on sb
>    and only check capable real_mount(mnt)->mnt_ns->user_ns
>    when subscribing to fs events on a specific mount
>
> I am not sure if there is a use case to allow watching events on a
> specific mount for capable mnt->mnt_sb->s_user_ns where
> real_mount(mnt)->mnt_ns->user_ns is not capable
> or if that setup is even possible?
>
> 3. Unlike may_decode_fh(), we do not check has_locked_children()
>     Not sure how bad it is to allow receiving events for fs changes in
>     obstructed paths (with file handles that cannot be opened).
> If this case does needs to be checked  then perhaps we should
> check for capable mnt->mnt_sb->s_user_ns also when subscribing
> to fs events on a mount.
>

OK, after some thinking I think it is best to align the logic to match
may_decode_fh() more closely, like this:

@@ -1986,13 +1987,40 @@ static int do_fanotify_mark(int fanotify_fd,
unsigned int flags, __u64 mask,
                inode = path.dentry->d_inode;
                obj = inode;
        } else if (obj_type == FSNOTIFY_OBJ_TYPE_VFSMOUNT) {
+               struct mount *mnt = real_mount(path.mnt);
+
                obj = path.mnt;
+               user_ns = path.mnt->mnt_sb->s_user_ns;
+               /*
+                * Do not allow watching a mount with locked mounts on top
+                * that could be hiding access to paths, unless user is also
+                * capable on the user ns that created the sb.
+                */
+               if (!ns_capable(user_ns, CAP_SYS_ADMIN) &&
+                   !has_locked_children(mnt, path.mnt->mnt_root))
+                       user_ns = mnt->mnt_ns->user_ns;
        } else if (obj_type == FSNOTIFY_OBJ_TYPE_SB) {
                obj = path.mnt->mnt_sb;
+               user_ns = path.mnt->mnt_sb->s_user_ns;
        } else if (obj_type == FSNOTIFY_OBJ_TYPE_MNTNS) {
-               obj = mnt_ns_from_dentry(path.dentry);
+               struct mnt_namespace *mntns = mnt_ns_from_dentry(path.dentry);
+
+               obj = mntns;
+               user_ns = mntns->user_ns;
        }

+       /*
+        * In addition to being capable in the user ns where group was created,
+        * the user also needs to be capable in the user ns associated with
+        * the marked filesystem or in the user ns associated with the mntns
+        * (when marking a mount or mntns).
+        * This is aligned with the required permissions to open_by_handle_at()
+        * a directory fid provided with the events.
+        */
+       ret = -EPERM;
+       if (user_ns && !ns_capable(user_ns, CAP_SYS_ADMIN))
+               goto path_put_and_out;
+

Thanks,
Amir.