On Wed, May 14, 2025 at 8:39 PM Amir Goldstein <amir73il@xxxxxxxxx> wrote: > > On Wed, May 14, 2025 at 5:49 PM Jan Kara <jack@xxxxxxx> wrote: > > > > On Sat 19-04-25 12:06:57, Amir Goldstein wrote: > > > An unprivileged user is allowed to create an fanotify group and add > > > inode marks, but not filesystem, mntns and mount marks. > > > > > > Add limited support for setting up filesystem, mntns and mount marks by > > > an unprivileged user under the following conditions: > > > > > > 1. User has CAP_SYS_ADMIN in the user ns where the group was created > > > 2.a. User has CAP_SYS_ADMIN in the user ns where the filesystem was > > > mounted (implies FS_USERNS_MOUNT) > > > OR (in case setting up a mntns or mount mark) > > > 2.b. User has CAP_SYS_ADMIN in the user ns associated with the mntns > > > > > > Signed-off-by: Amir Goldstein <amir73il@xxxxxxxxx> > > > > I'm sorry for the delay. Both patches look good to me but I'd like somebody > > more versed with user namespaces to double check namespace checks in this > > patch are indeed sound (so that we don't introduce some security issue). > > Good idea! > > > Christian, can you have a look please? > > > > Christian, > > Please note that the checks below are loosely modeled after the tests in > may_decode_fh(), with some differences: > > > > > > > /* > > > - * An unprivileged user is not allowed to setup mount nor filesystem > > > - * marks. This also includes setting up such marks by a group that > > > - * was initialized by an unprivileged user. > > > + * A user is allowed to setup sb/mount/mntns marks only if it is > > > + * capable in the user ns where the group was created. > > > */ > > > - if ((!capable(CAP_SYS_ADMIN) || > > > - FAN_GROUP_FLAG(group, FANOTIFY_UNPRIV)) && > > > + if (!ns_capable(group->user_ns, CAP_SYS_ADMIN) && > > > mark_type != FAN_MARK_INODE) > > > return -EPERM; > > > > > 1. This is an extra restriction. Not sure is need to remain forever, > but it reduces > attack surface and does not limit the common use cases, > so I think it's worth to have. > > > > @@ -1987,12 +1988,27 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask, > > > obj = inode; > > > } else if (obj_type == FSNOTIFY_OBJ_TYPE_VFSMOUNT) { > > > obj = path.mnt; > > > + user_ns = real_mount(obj)->mnt_ns->user_ns; > > > } else if (obj_type == FSNOTIFY_OBJ_TYPE_SB) { > > > obj = path.mnt->mnt_sb; > > > + user_ns = path.mnt->mnt_sb->s_user_ns; > > > } else if (obj_type == FSNOTIFY_OBJ_TYPE_MNTNS) { > > > obj = mnt_ns_from_dentry(path.dentry); > > > + user_ns = ((struct mnt_namespace *)obj)->user_ns; > > > } > > > > > > + /* > > > + * In addition to being capable in the user ns where group was created, > > > + * the user also needs to be capable in the user ns associated with > > > + * the marked filesystem (for FS_USERNS_MOUNT filesystems) or in the > > > + * user ns associated with the mntns (when marking a mount or mntns). > > > + * This is aligned with the required permissions to open_by_handle_at() > > > + * a directory fid provided with the events. > > > + */ > > > + ret = -EPERM; > > > + if (user_ns && !ns_capable(user_ns, CAP_SYS_ADMIN)) > > > + goto path_put_and_out; > > > + > > 2. In may_decode_fh() we know the mount that resulting file will be > opened from so we accept > Either capable mnt->mnt_sb->s_user_ns > OR capable real_mount(mnt)->mnt_ns->user_ns > whereas here we only check capable mnt->mnt_sb->s_user_ns > when subscribing to fs events on sb > and only check capable real_mount(mnt)->mnt_ns->user_ns > when subscribing to fs events on a specific mount > > I am not sure if there is a use case to allow watching events on a > specific mount for capable mnt->mnt_sb->s_user_ns where > real_mount(mnt)->mnt_ns->user_ns is not capable > or if that setup is even possible? > > 3. Unlike may_decode_fh(), we do not check has_locked_children() > Not sure how bad it is to allow receiving events for fs changes in > obstructed paths (with file handles that cannot be opened). > If this case does needs to be checked then perhaps we should > check for capable mnt->mnt_sb->s_user_ns also when subscribing > to fs events on a mount. > OK, after some thinking I think it is best to align the logic to match may_decode_fh() more closely, like this: @@ -1986,13 +1987,40 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask, inode = path.dentry->d_inode; obj = inode; } else if (obj_type == FSNOTIFY_OBJ_TYPE_VFSMOUNT) { + struct mount *mnt = real_mount(path.mnt); + obj = path.mnt; + user_ns = path.mnt->mnt_sb->s_user_ns; + /* + * Do not allow watching a mount with locked mounts on top + * that could be hiding access to paths, unless user is also + * capable on the user ns that created the sb. + */ + if (!ns_capable(user_ns, CAP_SYS_ADMIN) && + !has_locked_children(mnt, path.mnt->mnt_root)) + user_ns = mnt->mnt_ns->user_ns; } else if (obj_type == FSNOTIFY_OBJ_TYPE_SB) { obj = path.mnt->mnt_sb; + user_ns = path.mnt->mnt_sb->s_user_ns; } else if (obj_type == FSNOTIFY_OBJ_TYPE_MNTNS) { - obj = mnt_ns_from_dentry(path.dentry); + struct mnt_namespace *mntns = mnt_ns_from_dentry(path.dentry); + + obj = mntns; + user_ns = mntns->user_ns; } + /* + * In addition to being capable in the user ns where group was created, + * the user also needs to be capable in the user ns associated with + * the marked filesystem or in the user ns associated with the mntns + * (when marking a mount or mntns). + * This is aligned with the required permissions to open_by_handle_at() + * a directory fid provided with the events. + */ + ret = -EPERM; + if (user_ns && !ns_capable(user_ns, CAP_SYS_ADMIN)) + goto path_put_and_out; + Thanks, Amir.