Re: [PATCH v3 44/48] copy_tree(): don't link the mounts via mnt_list

"Lai, Yi" <yi1.lai@xxxxxxxxxxxxxxx> · Fri, 15 Aug 2025 11:19:31 +0800

On Fri, Aug 15, 2025 at 12:21:14AM +0100, Al Viro wrote:
> On Wed, Aug 13, 2025 at 08:32:24AM +0100, Al Viro wrote:
> > On Wed, Aug 13, 2025 at 08:13:03AM +0100, Al Viro wrote:
> > > On Wed, Aug 13, 2025 at 02:45:25PM +0800, Lai, Yi wrote:
> > > > Syzkaller repro code:
> > > > https://github.com/laifryiee/syzkaller_logs/tree/main/250813_093835_attach_recursive_mnt/repro.c
> > > 
> > > 404: The main branch of syzkaller_logs does not contain the path 250813_093835_attach_recursive_mnt/repro.c.
> > 
> > https://github.com/laifryiee/syzkaller_logs/blob/main/250813_093835_attach_recursive_mnt/repro.c
> > 
> > does get it...  Anyway, I'm about to fall down right now (half past 3am here),
> > will take a look once I get some sleep...
> 
> OK, I think I understand what's going on there.  FWIW, reproducer can be
> greatly simplified:
> 
> cd /tmp
> mkdir a
> mount --bind a a
> mount --make-shared a
> while mount --bind a a do echo splat; done
> 
> Beginning of that thing is to make it possible to clean the resulting mess
> out, when after about 16 iterations you run out of limit on the number of
> mounts - you are explicitly asking to double the number under /tmp/a
> on each iteration.  And default /proc/sys/fs/mount-max is set to 100000...
> 
> As for cleaning up, umount2("/tmp/a", MNT_DETACH); will do it...
> 
> The minimal fix should be to do commit_tree() just *before* the preceding
> if (q) {...} in attach_recursive_mnt().
> 
> Said that, this is not the only problem exposed by that reproducer - with
> that kind of long chain of overmounts, all peers to each other, we hit
> two more stupidities on the umount side - reparent() shouldn't fucking
> bother if the overmount is also going to be taken out and change_mnt_type()
> only needs to look for propagation source if the victim has slaves (those
> will need to be moved to new master) *or* if the victim is getting turned
> into a slave.
> 
> See if the following recovers the performance:
> 
> diff --git a/fs/namespace.c b/fs/namespace.c
> index a191c6519e36..88db58061919 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -1197,10 +1197,7 @@ static void commit_tree(struct mount *mnt)
>  
>  	if (!mnt_ns_attached(mnt)) {
>  		for (struct mount *m = mnt; m; m = next_mnt(m, mnt))
> -			if (unlikely(mnt_ns_attached(m)))
> -				m = skip_mnt_tree(m);
> -			else
> -				mnt_add_to_ns(n, m);
> +			mnt_add_to_ns(n, m);
>  		n->nr_mounts += n->pending_mounts;
>  		n->pending_mounts = 0;
>  	}
> @@ -2704,6 +2701,7 @@ static int attach_recursive_mnt(struct mount *source_mnt,
>  			lock_mnt_tree(child);
>  		q = __lookup_mnt(&child->mnt_parent->mnt,
>  				 child->mnt_mountpoint);
> +		commit_tree(child);
>  		if (q) {
>  			struct mountpoint *mp = root.mp;
>  			struct mount *r = child;
> @@ -2713,7 +2711,6 @@ static int attach_recursive_mnt(struct mount *source_mnt,
>  				mp = shorter;
>  			mnt_change_mountpoint(r, mp, q);
>  		}
> -		commit_tree(child);
>  	}
>  	unpin_mountpoint(&root);
>  	unlock_mount_hash();
> diff --git a/fs/pnode.c b/fs/pnode.c
> index 81f7599bdac4..040a8559b8f5 100644
> --- a/fs/pnode.c
> +++ b/fs/pnode.c
> @@ -111,7 +111,8 @@ void change_mnt_propagation(struct mount *mnt, int type)
>  		return;
>  	}
>  	if (IS_MNT_SHARED(mnt)) {
> -		m = propagation_source(mnt);
> +		if (type == MS_SLAVE || !hlist_empty(&mnt->mnt_slave_list))
> +			m = propagation_source(mnt);
>  		if (list_empty(&mnt->mnt_share)) {
>  			mnt_release_group_id(mnt);
>  		} else {
> @@ -595,6 +596,8 @@ static void reparent(struct mount *m)
>  	struct mount *p = m;
>  	struct mountpoint *mp;
>  
> +	if (will_be_unmounted(m))
> +		return;
>  	do {
>  		mp = p->mnt_mp;
>  		p = p->mnt_parent;

After applying this patch on top of linux-next, issue cannot be reproduced.

Regards,
Yi Lai