On Fri, May 16, 2025 at 7:28 PM Amir Goldstein <amir73il@xxxxxxxxx> wrote: > > On Wed, May 14, 2025 at 8:39 PM Amir Goldstein <amir73il@xxxxxxxxx> wrote: > > > > On Wed, May 14, 2025 at 5:49 PM Jan Kara <jack@xxxxxxx> wrote: > > > > > > On Sat 19-04-25 12:06:57, Amir Goldstein wrote: > > > > An unprivileged user is allowed to create an fanotify group and add > > > > inode marks, but not filesystem, mntns and mount marks. > > > > > > > > Add limited support for setting up filesystem, mntns and mount marks by > > > > an unprivileged user under the following conditions: > > > > > > > > 1. User has CAP_SYS_ADMIN in the user ns where the group was created > > > > 2.a. User has CAP_SYS_ADMIN in the user ns where the filesystem was > > > > mounted (implies FS_USERNS_MOUNT) > > > > OR (in case setting up a mntns or mount mark) > > > > 2.b. User has CAP_SYS_ADMIN in the user ns associated with the mntns > > > > > > > > Signed-off-by: Amir Goldstein <amir73il@xxxxxxxxx> > > > > > > I'm sorry for the delay. Both patches look good to me but I'd like somebody > > > more versed with user namespaces to double check namespace checks in this > > > patch are indeed sound (so that we don't introduce some security issue). > > > > Good idea! > > > > > Christian, can you have a look please? > > > > > > > Christian, > > > > Please note that the checks below are loosely modeled after the tests in > > may_decode_fh(), with some differences: > > > > > > > > > > /* > > > > - * An unprivileged user is not allowed to setup mount nor filesystem > > > > - * marks. This also includes setting up such marks by a group that > > > > - * was initialized by an unprivileged user. > > > > + * A user is allowed to setup sb/mount/mntns marks only if it is > > > > + * capable in the user ns where the group was created. > > > > */ > > > > - if ((!capable(CAP_SYS_ADMIN) || > > > > - FAN_GROUP_FLAG(group, FANOTIFY_UNPRIV)) && > > > > + if (!ns_capable(group->user_ns, CAP_SYS_ADMIN) && > > > > mark_type != FAN_MARK_INODE) > > > > return -EPERM; > > > > > > > > 1. This is an extra restriction. Not sure is need to remain forever, > > but it reduces > > attack surface and does not limit the common use cases, > > so I think it's worth to have. > > > > > > @@ -1987,12 +1988,27 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask, > > > > obj = inode; > > > > } else if (obj_type == FSNOTIFY_OBJ_TYPE_VFSMOUNT) { > > > > obj = path.mnt; > > > > + user_ns = real_mount(obj)->mnt_ns->user_ns; > > > > } else if (obj_type == FSNOTIFY_OBJ_TYPE_SB) { > > > > obj = path.mnt->mnt_sb; > > > > + user_ns = path.mnt->mnt_sb->s_user_ns; > > > > } else if (obj_type == FSNOTIFY_OBJ_TYPE_MNTNS) { > > > > obj = mnt_ns_from_dentry(path.dentry); > > > > + user_ns = ((struct mnt_namespace *)obj)->user_ns; > > > > } > > > > > > > > + /* > > > > + * In addition to being capable in the user ns where group was created, > > > > + * the user also needs to be capable in the user ns associated with > > > > + * the marked filesystem (for FS_USERNS_MOUNT filesystems) or in the > > > > + * user ns associated with the mntns (when marking a mount or mntns). > > > > + * This is aligned with the required permissions to open_by_handle_at() > > > > + * a directory fid provided with the events. > > > > + */ > > > > + ret = -EPERM; > > > > + if (user_ns && !ns_capable(user_ns, CAP_SYS_ADMIN)) > > > > + goto path_put_and_out; > > > > + > > > > 2. In may_decode_fh() we know the mount that resulting file will be > > opened from so we accept > > Either capable mnt->mnt_sb->s_user_ns > > OR capable real_mount(mnt)->mnt_ns->user_ns > > whereas here we only check capable mnt->mnt_sb->s_user_ns > > when subscribing to fs events on sb > > and only check capable real_mount(mnt)->mnt_ns->user_ns > > when subscribing to fs events on a specific mount > > > > I am not sure if there is a use case to allow watching events on a > > specific mount for capable mnt->mnt_sb->s_user_ns where > > real_mount(mnt)->mnt_ns->user_ns is not capable > > or if that setup is even possible? > > > > 3. Unlike may_decode_fh(), we do not check has_locked_children() > > Not sure how bad it is to allow receiving events for fs changes in > > obstructed paths (with file handles that cannot be opened). > > If this case does needs to be checked then perhaps we should > > check for capable mnt->mnt_sb->s_user_ns also when subscribing > > to fs events on a mount. > > > > OK, after some thinking I think it is best to align the logic to match > may_decode_fh() more closely, like this: > > @@ -1986,13 +1987,40 @@ static int do_fanotify_mark(int fanotify_fd, > unsigned int flags, __u64 mask, > inode = path.dentry->d_inode; > obj = inode; > } else if (obj_type == FSNOTIFY_OBJ_TYPE_VFSMOUNT) { > + struct mount *mnt = real_mount(path.mnt); > + > obj = path.mnt; > + user_ns = path.mnt->mnt_sb->s_user_ns; > + /* > + * Do not allow watching a mount with locked mounts on top > + * that could be hiding access to paths, unless user is also > + * capable on the user ns that created the sb. > + */ > + if (!ns_capable(user_ns, CAP_SYS_ADMIN) && > + !has_locked_children(mnt, path.mnt->mnt_root)) > + user_ns = mnt->mnt_ns->user_ns; No scratch that. I will send v2 that is much simpler. Thanks, Amir.