On Fri, May 02, 2025 at 10:46:26PM -0500, Eric W. Biederman wrote: > Al Viro <viro@xxxxxxxxxxxxxxxxxx> writes: > > > Back in 2011, when ->d_automount() had been introduced, > > we went with "stepping on NFS referral, etc., has the submount > > inherit the flags of parent one" (with the obvious exceptions > > for internal-only flags). Back then MNT_LOCKED didn't exist. > > > > Two years later, when MNT_LOCKED had been added, an explicit > > "don't set MNT_LOCKED on expirable mounts when propagating across > > the userns boundary; their underlying mountpoints can be exposed > > whenever the original expires anyway". Same went for root of > > subtree attached by explicit mount --[r]bind - the mountpoint > > had been exposed before the call, after all and for roots of > > any propagation copies created by such (same reason). Normal mount > > (created by do_new_mount()) could never get MNT_LOCKED to start with. > > > > However, mounts created by finish_automount() bloody well > > could - if the parent mount had MNT_LOCKED on it, submounts would > > inherited it. Even if they had been expirable. Moreover, all their > > propagation copies would have MNT_LOCKED stripped out. > > > > IMO this inconsistency is a bug; MNT_LOCKED should not > > be inherited in finish_automount(). > > > > Eric, is there something subtle I'm missing here? > > I don't think you are missing anything. This looks like a pretty clear > cut case of simply not realizing finish_automount was special in a way > that could result in MNT_LOCKED getting set. > > I skimmed through the code just a minute ago and my reading of it > matches your reading of it above. > > The intended semantics of MNT_LOCKED are to not let an unprivileged user > see under mounts they would never be able to see under without creating > a mount namespace. > > The mount point of an automount is pretty clearly something that is safe > to see under. Doubly so if this is a directory that will always be > empty on a pseudo filesystem (aka autofs). Does anybody have objections to the following? [PATCH] finish_automount(): don't leak MNT_LOCKED from parent to child Intention for MNT_LOCKED had always been to protect the internal mountpoints within a subtree that got copied across the userns boundary, not the mountpoint that tree got attached to - after all, it _was_ exposed before the copying. For roots of secondary copies that is enforced in attach_recursive_mnt() - MNT_LOCKED is explicitly stripped for those. For the root of primary copy we are almost always guaranteed that MNT_LOCKED won't be there, so attach_recursive_mnt() doesn't bother. Unfortunately, one call chain got overlooked - triggering e.g. NFS referral will have the submount inherit the public flags from parent; that's fine for such things as read-only, nosuid, etc., but not for MNT_LOCKED. This is particularly pointless since the mount attached by finish_automount() is usually expirable, which makes any protection granted by MNT_LOCKED null and void; just wait for a while and that mount will go away on its own. The minimal fix is to have do_add_mount() treat MNT_LOCKED the same way as other internal-only flags. Longer term it would be cleaner to deal with that in attach_recursive_mnt(), but that takes a bit more massage, so let's go with the one-liner fix for now. Fixes: 5ff9d8a65ce8 ("vfs: Lock in place mounts from more privileged users") Signed-off-by: Al Viro <viro@xxxxxxxxxxxxxxxxxx> --- diff --git a/fs/namespace.c b/fs/namespace.c index 04a9bb9f31fa..352b4ccf1aaa 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -3761,7 +3761,7 @@ static int do_add_mount(struct mount *newmnt, struct mountpoint *mp, { struct mount *parent = real_mount(path->mnt); - mnt_flags &= ~MNT_INTERNAL_FLAGS; + mnt_flags &= ~(MNT_INTERNAL_FLAGS | MNT_LOCKED); if (unlikely(!check_mnt(parent))) { /* that's acceptable only for automounts done in private ns */