Re: [PATCH v2 2/2] fanotify: support watching filesystems and mounts inside userns

Amir Goldstein <amir73il@xxxxxxxxx> · Wed, 14 May 2025 20:39:19 +0200

On Wed, May 14, 2025 at 5:49 PM Jan Kara <jack@xxxxxxx> wrote:
>
> On Sat 19-04-25 12:06:57, Amir Goldstein wrote:
> > An unprivileged user is allowed to create an fanotify group and add
> > inode marks, but not filesystem, mntns and mount marks.
> >
> > Add limited support for setting up filesystem, mntns and mount marks by
> > an unprivileged user under the following conditions:
> >
> > 1.   User has CAP_SYS_ADMIN in the user ns where the group was created
> > 2.a. User has CAP_SYS_ADMIN in the user ns where the filesystem was
> >      mounted (implies FS_USERNS_MOUNT)
> >   OR (in case setting up a mntns or mount mark)
> > 2.b. User has CAP_SYS_ADMIN in the user ns associated with the mntns
> >
> > Signed-off-by: Amir Goldstein <amir73il@xxxxxxxxx>
>
> I'm sorry for the delay. Both patches look good to me but I'd like somebody
> more versed with user namespaces to double check namespace checks in this
> patch are indeed sound (so that we don't introduce some security issue).

Good idea!

> Christian, can you have a look please?
>

Christian,

Please note that the checks below are loosely modeled after the tests in
may_decode_fh(), with some differences:

> >
> >       /*
> > -      * An unprivileged user is not allowed to setup mount nor filesystem
> > -      * marks.  This also includes setting up such marks by a group that
> > -      * was initialized by an unprivileged user.
> > +      * A user is allowed to setup sb/mount/mntns marks only if it is
> > +      * capable in the user ns where the group was created.
> >        */
> > -     if ((!capable(CAP_SYS_ADMIN) ||
> > -          FAN_GROUP_FLAG(group, FANOTIFY_UNPRIV)) &&
> > +     if (!ns_capable(group->user_ns, CAP_SYS_ADMIN) &&
> >           mark_type != FAN_MARK_INODE)
> >               return -EPERM;
> >

1. This is an extra restriction. Not sure is need to remain forever,
but it reduces
    attack surface and does not limit the common use cases,
    so I think it's worth to have.

> > @@ -1987,12 +1988,27 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask,
> >               obj = inode;
> >       } else if (obj_type == FSNOTIFY_OBJ_TYPE_VFSMOUNT) {
> >               obj = path.mnt;
> > +             user_ns = real_mount(obj)->mnt_ns->user_ns;
> >       } else if (obj_type == FSNOTIFY_OBJ_TYPE_SB) {
> >               obj = path.mnt->mnt_sb;
> > +             user_ns = path.mnt->mnt_sb->s_user_ns;
> >       } else if (obj_type == FSNOTIFY_OBJ_TYPE_MNTNS) {
> >               obj = mnt_ns_from_dentry(path.dentry);
> > +             user_ns = ((struct mnt_namespace *)obj)->user_ns;
> >       }
> >
> > +     /*
> > +      * In addition to being capable in the user ns where group was created,
> > +      * the user also needs to be capable in the user ns associated with
> > +      * the marked filesystem (for FS_USERNS_MOUNT filesystems) or in the
> > +      * user ns associated with the mntns (when marking a mount or mntns).
> > +      * This is aligned with the required permissions to open_by_handle_at()
> > +      * a directory fid provided with the events.
> > +      */
> > +     ret = -EPERM;
> > +     if (user_ns && !ns_capable(user_ns, CAP_SYS_ADMIN))
> > +             goto path_put_and_out;
> > +

2. In may_decode_fh() we know the mount that resulting file will be
    opened from so we accept
    Either capable mnt->mnt_sb->s_user_ns
    OR capable real_mount(mnt)->mnt_ns->user_ns
whereas here we only check capable mnt->mnt_sb->s_user_ns
   when subscribing to fs events on sb
   and only check capable real_mount(mnt)->mnt_ns->user_ns
   when subscribing to fs events on a specific mount

I am not sure if there is a use case to allow watching events on a
specific mount for capable mnt->mnt_sb->s_user_ns where
real_mount(mnt)->mnt_ns->user_ns is not capable
or if that setup is even possible?

3. Unlike may_decode_fh(), we do not check has_locked_children()
    Not sure how bad it is to allow receiving events for fs changes in
    obstructed paths (with file handles that cannot be opened).
If this case does needs to be checked  then perhaps we should
check for capable mnt->mnt_sb->s_user_ns also when subscribing
to fs events on a mount.

Thanks,
Amir.