On Mon, May 05, 2025 at 12:24:41AM +0100, Al Viro wrote: > On Fri, May 02, 2025 at 10:46:26PM -0500, Eric W. Biederman wrote: > > Al Viro <viro@xxxxxxxxxxxxxxxxxx> writes: > > > > > Back in 2011, when ->d_automount() had been introduced, > > > we went with "stepping on NFS referral, etc., has the submount > > > inherit the flags of parent one" (with the obvious exceptions > > > for internal-only flags). Back then MNT_LOCKED didn't exist. > > > > > > Two years later, when MNT_LOCKED had been added, an explicit > > > "don't set MNT_LOCKED on expirable mounts when propagating across > > > the userns boundary; their underlying mountpoints can be exposed > > > whenever the original expires anyway". Same went for root of > > > subtree attached by explicit mount --[r]bind - the mountpoint > > > had been exposed before the call, after all and for roots of > > > any propagation copies created by such (same reason). Normal mount > > > (created by do_new_mount()) could never get MNT_LOCKED to start with. > > > > > > However, mounts created by finish_automount() bloody well > > > could - if the parent mount had MNT_LOCKED on it, submounts would > > > inherited it. Even if they had been expirable. Moreover, all their > > > propagation copies would have MNT_LOCKED stripped out. > > > > > > IMO this inconsistency is a bug; MNT_LOCKED should not > > > be inherited in finish_automount(). > > > > > > Eric, is there something subtle I'm missing here? > > > > I don't think you are missing anything. This looks like a pretty clear > > cut case of simply not realizing finish_automount was special in a way > > that could result in MNT_LOCKED getting set. > > > > I skimmed through the code just a minute ago and my reading of it > > matches your reading of it above. > > > > The intended semantics of MNT_LOCKED are to not let an unprivileged user > > see under mounts they would never be able to see under without creating > > a mount namespace. > > > > The mount point of an automount is pretty clearly something that is safe > > to see under. Doubly so if this is a directory that will always be > > empty on a pseudo filesystem (aka autofs). > > Does anybody have objections to the following? > > [PATCH] finish_automount(): don't leak MNT_LOCKED from parent to child > > Intention for MNT_LOCKED had always been to protect the internal > mountpoints within a subtree that got copied across the userns boundary, > not the mountpoint that tree got attached to - after all, it _was_ > exposed before the copying. > > For roots of secondary copies that is enforced in attach_recursive_mnt() - > MNT_LOCKED is explicitly stripped for those. For the root of primary > copy we are almost always guaranteed that MNT_LOCKED won't be there, > so attach_recursive_mnt() doesn't bother. Unfortunately, one call > chain got overlooked - triggering e.g. NFS referral will have the /me rolls eyes. Why is it always NFS. > submount inherit the public flags from parent; that's fine for such > things as read-only, nosuid, etc., but not for MNT_LOCKED. > > This is particularly pointless since the mount attached by finish_automount() > is usually expirable, which makes any protection granted by MNT_LOCKED > null and void; just wait for a while and that mount will go away on its own. > > The minimal fix is to have do_add_mount() treat MNT_LOCKED the same > way as other internal-only flags. Longer term it would be cleaner to > deal with that in attach_recursive_mnt(), but that takes a bit more > massage, so let's go with the one-liner fix for now. > > Fixes: 5ff9d8a65ce8 ("vfs: Lock in place mounts from more privileged users") > Signed-off-by: Al Viro <viro@xxxxxxxxxxxxxxxxxx> > --- Reviewed-by: Christian Brauner <brauner@xxxxxxxxxx> > diff --git a/fs/namespace.c b/fs/namespace.c > index 04a9bb9f31fa..352b4ccf1aaa 100644 > --- a/fs/namespace.c > +++ b/fs/namespace.c > @@ -3761,7 +3761,7 @@ static int do_add_mount(struct mount *newmnt, struct mountpoint *mp, > { > struct mount *parent = real_mount(path->mnt); > > - mnt_flags &= ~MNT_INTERNAL_FLAGS; > + mnt_flags &= ~(MNT_INTERNAL_FLAGS | MNT_LOCKED); > > if (unlikely(!check_mnt(parent))) { > /* that's acceptable only for automounts done in private ns */