Re: [RFC][PATCH] btrfs_get_tree_subvol(): switch from fc_mount() to vfs_create_mount()

Al Viro <viro@xxxxxxxxxxxxxxxxxx> · Tue, 6 May 2025 20:05:13 +0100

On Tue, May 06, 2025 at 08:34:27PM +0200, Klara Modin wrote:

> > What's more, on the overlayfs side we managed to get to
> >         upper_mnt = clone_private_mount(upperpath);
> >         err = PTR_ERR(upper_mnt);
> >         if (IS_ERR(upper_mnt)) {
> >                 pr_err("failed to clone upperpath\n");
> >                 goto out;
> > so the upper path had been resolved...
> > 
> > OK, let's try to see what clone_private_mount() is unhappy about...
> > Could you try the following on top of -next + braino fix and see
> > what shows up?  Another interesting thing, assuming you can get
> > to shell after overlayfs mount failure, would be /proc/self/mountinfo
> > contents and stat(1) output for upper path of your overlayfs mount...
> 
> It looks like the mount never succeded in the first place? It doesn't
> appear in /proc/self/mountinfo at all:
> 
> 2 2 0:2 / / rw - rootfs rootfs rw
> 24 2 0:22 / /proc rw,relatime - proc proc rw
> 25 2 0:23 / /sys rw,relatime - sysfs sys rw
> 26 2 0:6 / /dev rw,relatime - devtmpfs dev rw,size=481992k,nr_inodes=120498,mode=755
> 27 2 259:1 / /mnt/root-ro ro,relatime - squashfs /dev/nvme0n1 ro,errors=continue
> 
> I get the "kern_mount?" message.

What the... actually, the comment in front of that thing makes no
sense whatsoever - it's *not* something kernel-internal; we get
there for mounts that are absolute roots of some non-anonymous
namespace; kernel-internal ones fail on if (!is_mounted(...))
just above that.

OK, the comment came from db04662e2f4f "fs: allow detached mounts
in clone_private_mount()" and it does point in an interesting
direction - commit message there speaks of overlayfs and use of
descriptors to specify layers.

Not that check_for_nsfs_mounts() (from the same commit) made any sense
there - we don't *care* about anything mounted somewhere in that mount,
since whatever's mounted on top of it does not follow into the copy
(which is what has_locked_children() call is about - in effect, in copy
you see all mountpoints that had been covered in the original)...

Oh, well - so we are seeing an absolute root of some non-anonymous
namespace there.  Or a weird detached mount claimed to belong to
some namespace, anyway.

Let's see if that's the way upperpath comes to be (and get a bit more
information on that weird mount):

diff --git a/fs/namespace.c b/fs/namespace.c
index eb990e9a668a..9b4c4afa2b29 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2480,31 +2480,52 @@ struct vfsmount *clone_private_mount(const struct path *path)
 
 	guard(rwsem_read)(&namespace_sem);
 
-	if (IS_MNT_UNBINDABLE(old_mnt))
+	if (IS_MNT_UNBINDABLE(old_mnt)) {
+		pr_err("unbindable");
 		return ERR_PTR(-EINVAL);
+	}
 
 	if (mnt_has_parent(old_mnt)) {
-		if (!check_mnt(old_mnt))
+		if (!check_mnt(old_mnt)) {
+			pr_err("mounted, but not in our namespace");
 			return ERR_PTR(-EINVAL);
+		}
 	} else {
-		if (!is_mounted(&old_mnt->mnt))
+		if (!is_mounted(&old_mnt->mnt)) {
+			pr_err("not mounted");
 			return ERR_PTR(-EINVAL);
+		}
 
 		/* Make sure this isn't something purely kernel internal. */
-		if (!is_anon_ns(old_mnt->mnt_ns))
+		if (!is_anon_ns(old_mnt->mnt_ns)) {
+			if (old_mnt == old_mnt->mnt_ns->root)
+				pr_err("absolute root");
+			else
+				pr_err("detached, but claimed to be in some ns");
+			if (check_mnt(old_mnt))
+				pr_err("our namespace, at that");
+			else
+				pr_err("some other non-anon namespace");
 			return ERR_PTR(-EINVAL);
+		}
 
 		/* Make sure we don't create mount namespace loops. */
-		if (!check_for_nsfs_mounts(old_mnt))
+		if (!check_for_nsfs_mounts(old_mnt)) {
+			pr_err("shite with nsfs");
 			return ERR_PTR(-EINVAL);
+		}
 	}
 
-	if (has_locked_children(old_mnt, path->dentry))
+	if (has_locked_children(old_mnt, path->dentry)) {
+		pr_err("has locked children");
 		return ERR_PTR(-EINVAL);
+	}
 
 	new_mnt = clone_mnt(old_mnt, path->dentry, CL_PRIVATE);
-	if (IS_ERR(new_mnt))
+	if (IS_ERR(new_mnt)) {
+		pr_err("clone_mnt failed (%ld)", PTR_ERR(new_mnt));
 		return ERR_PTR(-EINVAL);
+	}
 
 	/* Longterm mount to be removed by kern_unmount*() */
 	new_mnt->mnt_ns = MNT_NS_INTERNAL;