On Thu, Jul 24, 2025 at 04:49:24PM +0200, Günther Noack wrote: > On Wed, Jul 23, 2025 at 11:01:42PM +0200, Mickaël Salaün wrote: > > On Tue, Jul 22, 2025 at 07:04:02PM +0100, Tingmao Wang wrote: > > > On the other hand, I'm still a bit uncertain about the domain check > > > semantics. While it would not cause a rename to be allowed if it is > > > otherwise not allowed by any rules on or above the mountpoint, this gets a > > > bit weird if we have a situation where renames are allowed on the > > > mountpoint or everywhere, but not read/writes, however read/writes are > > > allowed directly on a file, but the dir containing that file gets > > > disconnected so the sandboxed application can't read or write to it. > > > (Maybe someone would set up such a policy where renames are allowed, > > > expecting Landlock to always prevent renames where additional permissions > > > would be exposed?) > > > > > > In the above situation, if the file is then moved to a connected > > > directory, it will become readable/writable again. > > > > We can generalize this issue to not only the end file but any component > > of the path: disconnected directories. In fact, the main issue is the > > potential inconsistency of access checks over time (e.g. between two > > renames). This could be exploited to bypass the security checks done > > for FS_REFER. > > > > I see two solutions: > > > > 1. *Always* walk down to the IS_ROOT directory, and then jump to the > > mount point. This makes it possible to have consistent access checks > > for renames and open/use. The first downside is that that would > > change the current behavior for bind mounts that could get more > > access rights (if the policy explicitly sets rights for the hidden > > directories). The second downside is that we'll do more walk. > > > > 2. Return -EACCES (or -ENOENT) for actions involving disconnected > > directories, or renames of disconnected opened files. This second > > solution is simpler and safer but completely disables the use of > > disconnected directories and the rename of disconnected files for > > sandboxed processes. > > > > It would be much better to be able to handle opened directories as > > (object) capabilities, but that is not currently possible because of the > > way paths are handled by the VFS and LSM hooks. > > > > Tingmao, Günther, Jann, what do you think? > > I have to admit that so far, I still failed to wrap my head around the > full patch set and its possible corner cases. I hope I did not > misunderstand things all too badly below: > > As far as I understand the proposed patch, we are "checkpointing" the > intermediate results of the path walk at every mount point boundary, > and in the case where we run into a disconnected directory in one of > the nested mount points, we restore from the intermediate result at > the previous mount point directory and skip to the next mount point. Correct > > Visually speaking, if the layout is this (where ":" denotes a > mountpoint boundary between the mountpoints MP1, MP2, MP3): > > dirfd > | > : V : > : ham <--- spam <--- eggs <--- x.txt > : (disconn.) : > : : > / <--- foo <--- bar <--- baz : > : : > MP1 MP2 MP3 > > When a process holds a reference to the "spam" directory, which is now > disconnected, and invokes openat(dirfd, "eggs/x.txt", ...), then we > would: > > * traverse x.txt > * traverse eggs (checkpointing the intermediate result) <-. > * traverse spam | > * traverse ham | > * discover that ham is disconnected: | > * restore the intermediate result from "eggs" ---------' > * continue the walk at foo > * end up at the root > > So effectively, since the results from "spam" and "ham" are discarded, > we would traverse only the inodes in the outer and inner mountpoints > MP1 and MP3, but effectively return a result that looks like we did > not traverse MP2? We'd still check MP2's inode, but otherwise yes. > > Maybe (likely) I misread the code. :) It's not clear to me what the > thinking behind this is. Also, if there was another directory in > between "spam" and "eggs" in MP2, wouldn't we be missing the access > rights attached to this directory? Yes, we would ignore this access right because we don't know that the path was resolved from spam. > > > Regarding the capability approach: > > I agree that a "capability" approach would be the better solution, but > it seems infeasible with the existing LSM hooks at the moment. I > would be in favor of it though. Yes, it would be a new feature with potential important changes. In the meantime, we still need a fix for disconnected directories, and this fix needs to be backported. That's why the capability approach is not part of the two solutions. ;) > > To spell it out a bit more explicitly what that would mean in my mind: > > When a path is looked up relative to a dirfd, the path walk upwards > would terminate at the dirfd and use previously calculated access > rights stored in the associated struct file. These access rights > would be determined at the time of opening the dirfd, similar to how we > are already storing the "truncate" access right today for regular > files. > > (Remark: There might still be corner cases where we have to solve it > the hard way, if someone uses ".." together with a dirfd-relative > lookup.) Yep, real capabilities don't have ".." in their design. On Linux (and Landlock), we need to properly handle "..", which is challenging. > > I also looked at what it would take to change the LSM hooks to pass > the directory that the lookup was done relative to, but it seems that > this would have to be passed through a bunch of VFS callbacks as well, > which seems like a larger change. I would be curious whether that > would be deemed an acceptable change. > > —Günther > > > P.S. Related to relative directory lookups, there is some movement in > the BSDs as well to use dirfds as capabilities, by adding a flag to > open directories that enforces O_BENEATH on subsequent opens: > > * https://undeadly.org/cgi?action=article;sid=20250529080623 > * https://reviews.freebsd.org/D50371 > > (both found via https://news.ycombinator.com/item?id=44575361) > > If a dirfd had such a flag, that would get rid of the corner case > above. This would be nice but it would not solve the current issue because we cannot force all processes to use this flag (which breaks some use cases). FYI, Capsicum is a more complete implementation: https://man.freebsd.org/cgi/man.cgi?query=capsicum&sektion=4 See the vfs.lookup_cap_dotdot sysctl too.