Re: [RFC] MNT_LOCKED vs. finish_automount()

"Eric W. Biederman" <ebiederm@xxxxxxxxxxxx> · Mon, 05 May 2025 08:52:16 -0500

Al Viro <viro@xxxxxxxxxxxxxxxxxx> writes:

> On Fri, May 02, 2025 at 10:46:26PM -0500, Eric W. Biederman wrote:
>> Al Viro <viro@xxxxxxxxxxxxxxxxxx> writes:
>> 
>> > 	Back in 2011, when ->d_automount() had been introduced,
>> > we went with "stepping on NFS referral, etc., has the submount
>> > inherit the flags of parent one" (with the obvious exceptions
>> > for internal-only flags).  Back then MNT_LOCKED didn't exist.
>> >
>> > 	Two years later, when MNT_LOCKED had been added, an explicit
>> > "don't set MNT_LOCKED on expirable mounts when propagating across
>> > the userns boundary; their underlying mountpoints can be exposed
>> > whenever the original expires anyway".  Same went for root of
>> > subtree attached by explicit mount --[r]bind - the mountpoint
>> > had been exposed before the call, after all and for roots of
>> > any propagation copies created by such (same reason).  Normal mount
>> > (created by do_new_mount()) could never get MNT_LOCKED to start with.
>> >
>> > 	However, mounts created by finish_automount() bloody well
>> > could - if the parent mount had MNT_LOCKED on it, submounts would
>> > inherited it.  Even if they had been expirable.  Moreover, all their
>> > propagation copies would have MNT_LOCKED stripped out.
>> >
>> > 	IMO this inconsistency is a bug; MNT_LOCKED should not
>> > be inherited in finish_automount().
>> >
>> > 	Eric, is there something subtle I'm missing here?
>> 
>> I don't think you are missing anything.  This looks like a pretty clear
>> cut case of simply not realizing finish_automount was special in a way
>> that could result in MNT_LOCKED getting set.
>> 
>> I skimmed through the code just a minute ago and my reading of it
>> matches your reading of it above.
>> 
>> The intended semantics of MNT_LOCKED are to not let an unprivileged user
>> see under mounts they would never be able to see under without creating
>> a mount namespace.
>> 
>> The mount point of an automount is pretty clearly something that is safe
>> to see under.  Doubly so if this is a directory that will always be
>> empty on a pseudo filesystem (aka autofs).
>
> Does anybody have objections to the following?
>
> [PATCH] finish_automount(): don't leak MNT_LOCKED from parent to child
>
> Intention for MNT_LOCKED had always been to protect the internal
> mountpoints within a subtree that got copied across the userns boundary,
> not the mountpoint that tree got attached to - after all, it _was_
> exposed before the copying.
>
> For roots of secondary copies that is enforced in attach_recursive_mnt() -
> MNT_LOCKED is explicitly stripped for those.  For the root of primary
> copy we are almost always guaranteed that MNT_LOCKED won't be there,
> so attach_recursive_mnt() doesn't bother.  Unfortunately, one call
> chain got overlooked - triggering e.g. NFS referral will have the
> submount inherit the public flags from parent; that's fine for such
> things as read-only, nosuid, etc., but not for MNT_LOCKED.
>
> This is particularly pointless since the mount attached by finish_automount()
> is usually expirable, which makes any protection granted by MNT_LOCKED
> null and void; just wait for a while and that mount will go away on its own.
>
> The minimal fix is to have do_add_mount() treat MNT_LOCKED the same
> way as other internal-only flags.  Longer term it would be cleaner to
> deal with that in attach_recursive_mnt(), but that takes a bit more
> massage, so let's go with the one-liner fix for now.

How would you deal with this in attach_recursive_mnt?  The problem case
appears to be an automount on top of a recursive mount.  So I don't
see the recursive mount code path being involved at all.

Given that only new mounts explicitly specified by a user and
finish_automount call do_add_mount this looks safe.  Having
MNT_LOCKED set will always be inappropriate in these two
cases.

Eric

> Fixes: 5ff9d8a65ce8 ("vfs: Lock in place mounts from more privileged users")
> Signed-off-by: Al Viro <viro@xxxxxxxxxxxxxxxxxx>
> ---
> diff --git a/fs/namespace.c b/fs/namespace.c
> index 04a9bb9f31fa..352b4ccf1aaa 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -3761,7 +3761,7 @@ static int do_add_mount(struct mount *newmnt, struct mountpoint *mp,
>  {
>  	struct mount *parent = real_mount(path->mnt);
>  
> -	mnt_flags &= ~MNT_INTERNAL_FLAGS;
> +	mnt_flags &= ~(MNT_INTERNAL_FLAGS | MNT_LOCKED);
>  
>  	if (unlikely(!check_mnt(parent))) {
>  		/* that's acceptable only for automounts done in private ns */