Re: [RFC] MNT_LOCKED vs. finish_automount()

Christian Brauner <brauner@xxxxxxxxxx> · Mon, 5 May 2025 15:54:44 +0200

On Mon, May 05, 2025 at 12:24:41AM +0100, Al Viro wrote:
> On Fri, May 02, 2025 at 10:46:26PM -0500, Eric W. Biederman wrote:
> > Al Viro <viro@xxxxxxxxxxxxxxxxxx> writes:
> > 
> > > 	Back in 2011, when ->d_automount() had been introduced,
> > > we went with "stepping on NFS referral, etc., has the submount
> > > inherit the flags of parent one" (with the obvious exceptions
> > > for internal-only flags).  Back then MNT_LOCKED didn't exist.
> > >
> > > 	Two years later, when MNT_LOCKED had been added, an explicit
> > > "don't set MNT_LOCKED on expirable mounts when propagating across
> > > the userns boundary; their underlying mountpoints can be exposed
> > > whenever the original expires anyway".  Same went for root of
> > > subtree attached by explicit mount --[r]bind - the mountpoint
> > > had been exposed before the call, after all and for roots of
> > > any propagation copies created by such (same reason).  Normal mount
> > > (created by do_new_mount()) could never get MNT_LOCKED to start with.
> > >
> > > 	However, mounts created by finish_automount() bloody well
> > > could - if the parent mount had MNT_LOCKED on it, submounts would
> > > inherited it.  Even if they had been expirable.  Moreover, all their
> > > propagation copies would have MNT_LOCKED stripped out.
> > >
> > > 	IMO this inconsistency is a bug; MNT_LOCKED should not
> > > be inherited in finish_automount().
> > >
> > > 	Eric, is there something subtle I'm missing here?
> > 
> > I don't think you are missing anything.  This looks like a pretty clear
> > cut case of simply not realizing finish_automount was special in a way
> > that could result in MNT_LOCKED getting set.
> > 
> > I skimmed through the code just a minute ago and my reading of it
> > matches your reading of it above.
> > 
> > The intended semantics of MNT_LOCKED are to not let an unprivileged user
> > see under mounts they would never be able to see under without creating
> > a mount namespace.
> > 
> > The mount point of an automount is pretty clearly something that is safe
> > to see under.  Doubly so if this is a directory that will always be
> > empty on a pseudo filesystem (aka autofs).
> 
> Does anybody have objections to the following?
> 
> [PATCH] finish_automount(): don't leak MNT_LOCKED from parent to child
> 
> Intention for MNT_LOCKED had always been to protect the internal
> mountpoints within a subtree that got copied across the userns boundary,
> not the mountpoint that tree got attached to - after all, it _was_
> exposed before the copying.
> 
> For roots of secondary copies that is enforced in attach_recursive_mnt() -
> MNT_LOCKED is explicitly stripped for those.  For the root of primary
> copy we are almost always guaranteed that MNT_LOCKED won't be there,
> so attach_recursive_mnt() doesn't bother.  Unfortunately, one call
> chain got overlooked - triggering e.g. NFS referral will have the

/me rolls eyes. Why is it always NFS.

> submount inherit the public flags from parent; that's fine for such
> things as read-only, nosuid, etc., but not for MNT_LOCKED.
> 
> This is particularly pointless since the mount attached by finish_automount()
> is usually expirable, which makes any protection granted by MNT_LOCKED
> null and void; just wait for a while and that mount will go away on its own.
> 
> The minimal fix is to have do_add_mount() treat MNT_LOCKED the same
> way as other internal-only flags.  Longer term it would be cleaner to
> deal with that in attach_recursive_mnt(), but that takes a bit more
> massage, so let's go with the one-liner fix for now.
> 
> Fixes: 5ff9d8a65ce8 ("vfs: Lock in place mounts from more privileged users")
> Signed-off-by: Al Viro <viro@xxxxxxxxxxxxxxxxxx>
> ---

Reviewed-by: Christian Brauner <brauner@xxxxxxxxxx>

> diff --git a/fs/namespace.c b/fs/namespace.c
> index 04a9bb9f31fa..352b4ccf1aaa 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -3761,7 +3761,7 @@ static int do_add_mount(struct mount *newmnt, struct mountpoint *mp,
>  {
>  	struct mount *parent = real_mount(path->mnt);
>  
> -	mnt_flags &= ~MNT_INTERNAL_FLAGS;
> +	mnt_flags &= ~(MNT_INTERNAL_FLAGS | MNT_LOCKED);
>  
>  	if (unlikely(!check_mnt(parent))) {
>  		/* that's acceptable only for automounts done in private ns */