On Fri, Aug 15, 2025 at 12:21:14AM +0100, Al Viro wrote: > On Wed, Aug 13, 2025 at 08:32:24AM +0100, Al Viro wrote: > > On Wed, Aug 13, 2025 at 08:13:03AM +0100, Al Viro wrote: > > > On Wed, Aug 13, 2025 at 02:45:25PM +0800, Lai, Yi wrote: > > > > Syzkaller repro code: > > > > https://github.com/laifryiee/syzkaller_logs/tree/main/250813_093835_attach_recursive_mnt/repro.c > > > > > > 404: The main branch of syzkaller_logs does not contain the path 250813_093835_attach_recursive_mnt/repro.c. > > > > https://github.com/laifryiee/syzkaller_logs/blob/main/250813_093835_attach_recursive_mnt/repro.c > > > > does get it... Anyway, I'm about to fall down right now (half past 3am here), > > will take a look once I get some sleep... > > OK, I think I understand what's going on there. FWIW, reproducer can be > greatly simplified: > > cd /tmp > mkdir a > mount --bind a a > mount --make-shared a > while mount --bind a a do echo splat; done > > Beginning of that thing is to make it possible to clean the resulting mess > out, when after about 16 iterations you run out of limit on the number of > mounts - you are explicitly asking to double the number under /tmp/a > on each iteration. And default /proc/sys/fs/mount-max is set to 100000... > > As for cleaning up, umount2("/tmp/a", MNT_DETACH); will do it... > > The minimal fix should be to do commit_tree() just *before* the preceding > if (q) {...} in attach_recursive_mnt(). > > Said that, this is not the only problem exposed by that reproducer - with > that kind of long chain of overmounts, all peers to each other, we hit > two more stupidities on the umount side - reparent() shouldn't fucking > bother if the overmount is also going to be taken out and change_mnt_type() > only needs to look for propagation source if the victim has slaves (those > will need to be moved to new master) *or* if the victim is getting turned > into a slave. > > See if the following recovers the performance: > > diff --git a/fs/namespace.c b/fs/namespace.c > index a191c6519e36..88db58061919 100644 > --- a/fs/namespace.c > +++ b/fs/namespace.c > @@ -1197,10 +1197,7 @@ static void commit_tree(struct mount *mnt) > > if (!mnt_ns_attached(mnt)) { > for (struct mount *m = mnt; m; m = next_mnt(m, mnt)) > - if (unlikely(mnt_ns_attached(m))) > - m = skip_mnt_tree(m); > - else > - mnt_add_to_ns(n, m); > + mnt_add_to_ns(n, m); > n->nr_mounts += n->pending_mounts; > n->pending_mounts = 0; > } > @@ -2704,6 +2701,7 @@ static int attach_recursive_mnt(struct mount *source_mnt, > lock_mnt_tree(child); > q = __lookup_mnt(&child->mnt_parent->mnt, > child->mnt_mountpoint); > + commit_tree(child); > if (q) { > struct mountpoint *mp = root.mp; > struct mount *r = child; > @@ -2713,7 +2711,6 @@ static int attach_recursive_mnt(struct mount *source_mnt, > mp = shorter; > mnt_change_mountpoint(r, mp, q); > } > - commit_tree(child); > } > unpin_mountpoint(&root); > unlock_mount_hash(); > diff --git a/fs/pnode.c b/fs/pnode.c > index 81f7599bdac4..040a8559b8f5 100644 > --- a/fs/pnode.c > +++ b/fs/pnode.c > @@ -111,7 +111,8 @@ void change_mnt_propagation(struct mount *mnt, int type) > return; > } > if (IS_MNT_SHARED(mnt)) { > - m = propagation_source(mnt); > + if (type == MS_SLAVE || !hlist_empty(&mnt->mnt_slave_list)) > + m = propagation_source(mnt); > if (list_empty(&mnt->mnt_share)) { > mnt_release_group_id(mnt); > } else { > @@ -595,6 +596,8 @@ static void reparent(struct mount *m) > struct mount *p = m; > struct mountpoint *mp; > > + if (will_be_unmounted(m)) > + return; > do { > mp = p->mnt_mp; > p = p->mnt_parent; After applying this patch on top of linux-next, issue cannot be reproduced. Regards, Yi Lai