On 2025-05-06 20:05:13 +0100, Al Viro wrote: > On Tue, May 06, 2025 at 08:34:27PM +0200, Klara Modin wrote: > > > > What's more, on the overlayfs side we managed to get to > > > upper_mnt = clone_private_mount(upperpath); > > > err = PTR_ERR(upper_mnt); > > > if (IS_ERR(upper_mnt)) { > > > pr_err("failed to clone upperpath\n"); > > > goto out; > > > so the upper path had been resolved... > > > > > > OK, let's try to see what clone_private_mount() is unhappy about... > > > Could you try the following on top of -next + braino fix and see > > > what shows up? Another interesting thing, assuming you can get > > > to shell after overlayfs mount failure, would be /proc/self/mountinfo > > > contents and stat(1) output for upper path of your overlayfs mount... > > > > It looks like the mount never succeded in the first place? It doesn't > > appear in /proc/self/mountinfo at all: > > > > 2 2 0:2 / / rw - rootfs rootfs rw > > 24 2 0:22 / /proc rw,relatime - proc proc rw > > 25 2 0:23 / /sys rw,relatime - sysfs sys rw > > 26 2 0:6 / /dev rw,relatime - devtmpfs dev rw,size=481992k,nr_inodes=120498,mode=755 > > 27 2 259:1 / /mnt/root-ro ro,relatime - squashfs /dev/nvme0n1 ro,errors=continue > > > > I get the "kern_mount?" message. > > What the... actually, the comment in front of that thing makes no > sense whatsoever - it's *not* something kernel-internal; we get > there for mounts that are absolute roots of some non-anonymous > namespace; kernel-internal ones fail on if (!is_mounted(...)) > just above that. > > OK, the comment came from db04662e2f4f "fs: allow detached mounts > in clone_private_mount()" and it does point in an interesting > direction - commit message there speaks of overlayfs and use of > descriptors to specify layers. > > Not that check_for_nsfs_mounts() (from the same commit) made any sense > there - we don't *care* about anything mounted somewhere in that mount, > since whatever's mounted on top of it does not follow into the copy > (which is what has_locked_children() call is about - in effect, in copy > you see all mountpoints that had been covered in the original)... > > Oh, well - so we are seeing an absolute root of some non-anonymous > namespace there. Or a weird detached mount claimed to belong to > some namespace, anyway. > > Let's see if that's the way upperpath comes to be (and get a bit more > information on that weird mount): > > diff --git a/fs/namespace.c b/fs/namespace.c > index eb990e9a668a..9b4c4afa2b29 100644 > --- a/fs/namespace.c > +++ b/fs/namespace.c > @@ -2480,31 +2480,52 @@ struct vfsmount *clone_private_mount(const struct path *path) > > guard(rwsem_read)(&namespace_sem); > > - if (IS_MNT_UNBINDABLE(old_mnt)) > + if (IS_MNT_UNBINDABLE(old_mnt)) { > + pr_err("unbindable"); > return ERR_PTR(-EINVAL); > + } > > if (mnt_has_parent(old_mnt)) { > - if (!check_mnt(old_mnt)) > + if (!check_mnt(old_mnt)) { > + pr_err("mounted, but not in our namespace"); > return ERR_PTR(-EINVAL); > + } > } else { > - if (!is_mounted(&old_mnt->mnt)) > + if (!is_mounted(&old_mnt->mnt)) { > + pr_err("not mounted"); > return ERR_PTR(-EINVAL); > + } > > /* Make sure this isn't something purely kernel internal. */ > - if (!is_anon_ns(old_mnt->mnt_ns)) > + if (!is_anon_ns(old_mnt->mnt_ns)) { > + if (old_mnt == old_mnt->mnt_ns->root) > + pr_err("absolute root"); > + else > + pr_err("detached, but claimed to be in some ns"); > + if (check_mnt(old_mnt)) > + pr_err("our namespace, at that"); > + else > + pr_err("some other non-anon namespace"); > return ERR_PTR(-EINVAL); > + } > > /* Make sure we don't create mount namespace loops. */ > - if (!check_for_nsfs_mounts(old_mnt)) > + if (!check_for_nsfs_mounts(old_mnt)) { > + pr_err("shite with nsfs"); > return ERR_PTR(-EINVAL); > + } > } > > - if (has_locked_children(old_mnt, path->dentry)) > + if (has_locked_children(old_mnt, path->dentry)) { > + pr_err("has locked children"); > return ERR_PTR(-EINVAL); > + } > > new_mnt = clone_mnt(old_mnt, path->dentry, CL_PRIVATE); > - if (IS_ERR(new_mnt)) > + if (IS_ERR(new_mnt)) { > + pr_err("clone_mnt failed (%ld)", PTR_ERR(new_mnt)); > return ERR_PTR(-EINVAL); > + } > > /* Longterm mount to be removed by kern_unmount*() */ > new_mnt->mnt_ns = MNT_NS_INTERNAL; I then get: [ 0.881616] absolute root [ 0.881618] our namespace, at that In btrfs_get_tree_subvol: ret = vfs_get_tree(dup_fc); if (!ret) { ret = btrfs_reconfigure_for_mount(dup_fc); up_write(&dup_fc->root->d_sb->s_umount); } if (!ret) mnt = vfs_create_mount(fc); else mnt = ERR_PTR(ret); put_fs_context(dup_fc); Should it perhaps be: mnt = vfs_create_mount(dup_fc); If I try that it works.