On Wed, May 14, 2025 at 5:49 PM Jan Kara <jack@xxxxxxx> wrote: > > On Sat 19-04-25 12:06:57, Amir Goldstein wrote: > > An unprivileged user is allowed to create an fanotify group and add > > inode marks, but not filesystem, mntns and mount marks. > > > > Add limited support for setting up filesystem, mntns and mount marks by > > an unprivileged user under the following conditions: > > > > 1. User has CAP_SYS_ADMIN in the user ns where the group was created > > 2.a. User has CAP_SYS_ADMIN in the user ns where the filesystem was > > mounted (implies FS_USERNS_MOUNT) > > OR (in case setting up a mntns or mount mark) > > 2.b. User has CAP_SYS_ADMIN in the user ns associated with the mntns > > > > Signed-off-by: Amir Goldstein <amir73il@xxxxxxxxx> > > I'm sorry for the delay. Both patches look good to me but I'd like somebody > more versed with user namespaces to double check namespace checks in this > patch are indeed sound (so that we don't introduce some security issue). Good idea! > Christian, can you have a look please? > Christian, Please note that the checks below are loosely modeled after the tests in may_decode_fh(), with some differences: > > > > /* > > - * An unprivileged user is not allowed to setup mount nor filesystem > > - * marks. This also includes setting up such marks by a group that > > - * was initialized by an unprivileged user. > > + * A user is allowed to setup sb/mount/mntns marks only if it is > > + * capable in the user ns where the group was created. > > */ > > - if ((!capable(CAP_SYS_ADMIN) || > > - FAN_GROUP_FLAG(group, FANOTIFY_UNPRIV)) && > > + if (!ns_capable(group->user_ns, CAP_SYS_ADMIN) && > > mark_type != FAN_MARK_INODE) > > return -EPERM; > > 1. This is an extra restriction. Not sure is need to remain forever, but it reduces attack surface and does not limit the common use cases, so I think it's worth to have. > > @@ -1987,12 +1988,27 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask, > > obj = inode; > > } else if (obj_type == FSNOTIFY_OBJ_TYPE_VFSMOUNT) { > > obj = path.mnt; > > + user_ns = real_mount(obj)->mnt_ns->user_ns; > > } else if (obj_type == FSNOTIFY_OBJ_TYPE_SB) { > > obj = path.mnt->mnt_sb; > > + user_ns = path.mnt->mnt_sb->s_user_ns; > > } else if (obj_type == FSNOTIFY_OBJ_TYPE_MNTNS) { > > obj = mnt_ns_from_dentry(path.dentry); > > + user_ns = ((struct mnt_namespace *)obj)->user_ns; > > } > > > > + /* > > + * In addition to being capable in the user ns where group was created, > > + * the user also needs to be capable in the user ns associated with > > + * the marked filesystem (for FS_USERNS_MOUNT filesystems) or in the > > + * user ns associated with the mntns (when marking a mount or mntns). > > + * This is aligned with the required permissions to open_by_handle_at() > > + * a directory fid provided with the events. > > + */ > > + ret = -EPERM; > > + if (user_ns && !ns_capable(user_ns, CAP_SYS_ADMIN)) > > + goto path_put_and_out; > > + 2. In may_decode_fh() we know the mount that resulting file will be opened from so we accept Either capable mnt->mnt_sb->s_user_ns OR capable real_mount(mnt)->mnt_ns->user_ns whereas here we only check capable mnt->mnt_sb->s_user_ns when subscribing to fs events on sb and only check capable real_mount(mnt)->mnt_ns->user_ns when subscribing to fs events on a specific mount I am not sure if there is a use case to allow watching events on a specific mount for capable mnt->mnt_sb->s_user_ns where real_mount(mnt)->mnt_ns->user_ns is not capable or if that setup is even possible? 3. Unlike may_decode_fh(), we do not check has_locked_children() Not sure how bad it is to allow receiving events for fs changes in obstructed paths (with file handles that cannot be opened). If this case does needs to be checked then perhaps we should check for capable mnt->mnt_sb->s_user_ns also when subscribing to fs events on a mount. Thanks, Amir.