Re: [RFC] MNT_LOCKED vs. finish_automount()

Al Viro <viro@xxxxxxxxxxxxxxxxxx> · Mon, 5 May 2025 00:24:41 +0100

On Fri, May 02, 2025 at 10:46:26PM -0500, Eric W. Biederman wrote:
> Al Viro <viro@xxxxxxxxxxxxxxxxxx> writes:
> 
> > 	Back in 2011, when ->d_automount() had been introduced,
> > we went with "stepping on NFS referral, etc., has the submount
> > inherit the flags of parent one" (with the obvious exceptions
> > for internal-only flags).  Back then MNT_LOCKED didn't exist.
> >
> > 	Two years later, when MNT_LOCKED had been added, an explicit
> > "don't set MNT_LOCKED on expirable mounts when propagating across
> > the userns boundary; their underlying mountpoints can be exposed
> > whenever the original expires anyway".  Same went for root of
> > subtree attached by explicit mount --[r]bind - the mountpoint
> > had been exposed before the call, after all and for roots of
> > any propagation copies created by such (same reason).  Normal mount
> > (created by do_new_mount()) could never get MNT_LOCKED to start with.
> >
> > 	However, mounts created by finish_automount() bloody well
> > could - if the parent mount had MNT_LOCKED on it, submounts would
> > inherited it.  Even if they had been expirable.  Moreover, all their
> > propagation copies would have MNT_LOCKED stripped out.
> >
> > 	IMO this inconsistency is a bug; MNT_LOCKED should not
> > be inherited in finish_automount().
> >
> > 	Eric, is there something subtle I'm missing here?
> 
> I don't think you are missing anything.  This looks like a pretty clear
> cut case of simply not realizing finish_automount was special in a way
> that could result in MNT_LOCKED getting set.
> 
> I skimmed through the code just a minute ago and my reading of it
> matches your reading of it above.
> 
> The intended semantics of MNT_LOCKED are to not let an unprivileged user
> see under mounts they would never be able to see under without creating
> a mount namespace.
> 
> The mount point of an automount is pretty clearly something that is safe
> to see under.  Doubly so if this is a directory that will always be
> empty on a pseudo filesystem (aka autofs).

Does anybody have objections to the following?

[PATCH] finish_automount(): don't leak MNT_LOCKED from parent to child

Intention for MNT_LOCKED had always been to protect the internal
mountpoints within a subtree that got copied across the userns boundary,
not the mountpoint that tree got attached to - after all, it _was_
exposed before the copying.

For roots of secondary copies that is enforced in attach_recursive_mnt() -
MNT_LOCKED is explicitly stripped for those.  For the root of primary
copy we are almost always guaranteed that MNT_LOCKED won't be there,
so attach_recursive_mnt() doesn't bother.  Unfortunately, one call
chain got overlooked - triggering e.g. NFS referral will have the
submount inherit the public flags from parent; that's fine for such
things as read-only, nosuid, etc., but not for MNT_LOCKED.

This is particularly pointless since the mount attached by finish_automount()
is usually expirable, which makes any protection granted by MNT_LOCKED
null and void; just wait for a while and that mount will go away on its own.

The minimal fix is to have do_add_mount() treat MNT_LOCKED the same
way as other internal-only flags.  Longer term it would be cleaner to
deal with that in attach_recursive_mnt(), but that takes a bit more
massage, so let's go with the one-liner fix for now.

Fixes: 5ff9d8a65ce8 ("vfs: Lock in place mounts from more privileged users")
Signed-off-by: Al Viro <viro@xxxxxxxxxxxxxxxxxx>
---

diff --git a/fs/namespace.c b/fs/namespace.c
index 04a9bb9f31fa..352b4ccf1aaa 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -3761,7 +3761,7 @@ static int do_add_mount(struct mount *newmnt, struct mountpoint *mp,
 {
 	struct mount *parent = real_mount(path->mnt);
 
-	mnt_flags &= ~MNT_INTERNAL_FLAGS;
+	mnt_flags &= ~(MNT_INTERNAL_FLAGS | MNT_LOCKED);
 
 	if (unlikely(!check_mnt(parent))) {
 		/* that's acceptable only for automounts done in private ns */