On Fri 29-08-25 14:55:13, Amir Goldstein wrote: > On Fri, Aug 29, 2025 at 12:50 PM Jan Kara <jack@xxxxxxx> wrote: > > > > On Wed 27-08-25 21:43:09, Amir Goldstein wrote: > > > Commit 620c266f39493 ("fhandle: relax open_by_handle_at() permission > > > checks") relaxed the coditions for decoding a file handle from non init > > > userns. > > > > > > The conditions are that that decoded dentry is accessible from the user > > > provided mountfd (or to fs root) and that all the ancestors along the > > > path have a valid id mapping in the userns. > > > > > > These conditions are intentionally more strict than the condition that > > > the decoded dentry should be "lookable" by path from the mountfd. > > > > > > For example, the path /home/amir/dir/subdir is lookable by path from > > > unpriv userns of user amir, because /home perms is 755, but the owner of > > > /home does not have a valid id mapping in unpriv userns of user amir. > > > > > > The current code did not check that the decoded dentry itself has a > > > valid id mapping in the userns. There is no security risk in that, > > > because that final open still performs the needed permission checks, > > > but this is inconsistent with the checks performed on the ancestors, > > > so the behavior can be a bit confusing. > > > > > > Add the check for the decoded dentry itself, so that the entire path, > > > including the last component has a valid id mapping in the userns. > > > > > > Fixes: 620c266f39493 ("fhandle: relax open_by_handle_at() permission checks") > > > Signed-off-by: Amir Goldstein <amir73il@xxxxxxxxx> > > > > Yeah, probably it's less surprising this way. Feel free to add: > > > > BTW, Jan, I was trying to think about whether we could do > something useful with privileged_wrt_inode_uidgid() for filtering > events that we queue by group->user_ns. > > Then users could allow something like: > 1. Admin sets up privileged fanotify fd and filesystem watch on > /home filesystem > 2. Enters userns of amir and does ioctl to change group->user_ns > to user ns of amir > 3. Hands over fanotify fd to monitor process running in amir's userns > 4. amir's monitor process gets all events on filesystem /home > whose directory and object uid/gid are mappable to amir's userns > 5. With properly configured systems, that we be all the files/dirs under > /home/amir > > I have posted several POCs in the past trying different approaches > for filtering by userns, but I have never tried to take this approach. > > Compared to subtree filtering, this could be quite pragmatic? Hmm? This is definitely relatively easy to implement in the kernel. I'm just not sure about two things: 1) Will this be easy enough to use from userspace so that it will get used? Mount watches have been created as a "partial" solution for subtree watches as well. But in practice it didn't get very widespread use as subtree watch replacement because setting up a mountpoint for subtree you want to watch is not flexible enough. Setting up userns and id mappings and proper inode ownership seems like a similar hassle for anything else than a full home dir as well... 2) Filtering all events on the fs only by inode owner being mappable to user ns looks somewhat dangerous to me. Sure you offload the responsibility of the safe setup to userspace but the fact that this completely bypasses any permission checks means that configuring the system so that it does not leak any unintended information (like filenames or facts that some things have changed user otherwise wouldn't be able to see) might be difficult. Consider if e.g. maildir is on your monitored fs and for some reason the UID of the postfix is mapped to your user ns (e.g. because the user needs access to some file/dir managed by postfix). Then you could monitor all fs activity of postfix possibly learning about emails to other persons in the system. > The difference from subtree filtering is that it shifts the responsibility > of making sure that /home/amir and /home/jack have files with uid,gid > in different ranges to the OS/runtime, which is a responsibility that > some systems are already taking care of anyway. At this point I'm not convinced there are that many systems where this way of filtering would be useful but I could be wrong. The fact that some ID is mappable in a namespace looks as kind of weak restriction because you may need to map into the namespace various external "system" ids AFAIU. But I can see that e.g. for containers the idea of restricting events to inodes whose owners are in a range of UIDs may be attractive. Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR