On Tue, Jun 10, 2025 at 09:21:30AM +0100, Al Viro wrote: > Originally MNT_LOCKED meant only one thing - "don't let this mount to > be peeled off its parent, we don't want to have its mountpoint exposed". > Accordingly, it had only been set on mounts that *do* have a parent. > Later it got overloaded with another use - setting it on the absolute > root had given free protection against umount(2) of absolute root > (was possible to trigger, oopsed). Not a bad trick, but it ended > up costing more than it bought us. Unfortunately, the cost included > both hard-to-reason-about logics and a subtle race between > mount -o remount,ro and mount --[r]bind - lockless &= ~MNT_LOCKED in > the end of __do_loopback() could race with sb_prepare_remount_readonly() > setting and clearing MNT_HOLD_WRITE (under mount_lock, as it should > be). The race wouldn't be much of a problem (there are other ways to > deal with it), but the subtlety is. > > Turns out that nobody except umount(2) had ever made use of having > MNT_LOCKED set on absolute root. So let's give up on that trick, > clever as it had been, add an explicit check in do_umount() and > return to using MNT_LOCKED only for mounts that have a parent. > > It means that > * clone_mnt() no longer copies MNT_LOCKED > * copy_tree() sets it on submounts if their counterparts had > been marked such, and does that right next to attach_mnt() in there, > in the same mount_lock scope. > * __do_loopback() no longer needs to strip MNT_LOCKED off the > root of subtree it's about to return; no store, no race. > * init_mount_tree() doesn't bother setting MNT_LOCKED on absolute > root. > * lock_mnt_tree() does not set MNT_LOCKED on the subtree's root; > accordingly, its caller (loop in attach_recursive_mnt()) does not need to > bother stripping that MNT_LOCKED on root. Note that lock_mnt_tree() setting > MNT_LOCKED on submounts happens in the same mount_lock scope as __attach_mnt() > (from commit_tree()) that makes them reachable. > > Signed-off-by: Al Viro <viro@xxxxxxxxxxxxxxxxxx> > --- Reviewed-by: Christian Brauner <brauner@xxxxxxxxxx> > fs/namespace.c | 32 +++++++++++++++----------------- > 1 file changed, 15 insertions(+), 17 deletions(-) > > diff --git a/fs/namespace.c b/fs/namespace.c > index e783eb801060..d6c81eab6a11 100644 > --- a/fs/namespace.c > +++ b/fs/namespace.c > @@ -1349,7 +1349,7 @@ static struct mount *clone_mnt(struct mount *old, struct dentry *root, > } > > mnt->mnt.mnt_flags = old->mnt.mnt_flags; > - mnt->mnt.mnt_flags &= ~(MNT_WRITE_HOLD|MNT_MARKED|MNT_INTERNAL); > + mnt->mnt.mnt_flags &= ~(MNT_WRITE_HOLD|MNT_MARKED|MNT_INTERNAL|MNT_LOCKED); > > atomic_inc(&sb->s_active); > mnt->mnt.mnt_idmap = mnt_idmap_get(mnt_idmap(&old->mnt)); > @@ -2024,6 +2024,9 @@ static int do_umount(struct mount *mnt, int flags) > if (mnt->mnt.mnt_flags & MNT_LOCKED) > goto out; > This deserves a comment imho. > + if (!mnt_has_parent(mnt)) > + goto out;