Re: [RFC][PATCH] btrfs_get_tree_subvol(): switch from fc_mount() to vfs_create_mount()

Klara Modin <klarasmodin@xxxxxxxxx> · Tue, 6 May 2025 21:20:47 +0200

On 2025-05-06 20:05:13 +0100, Al Viro wrote:
> On Tue, May 06, 2025 at 08:34:27PM +0200, Klara Modin wrote:
> 
> > > What's more, on the overlayfs side we managed to get to
> > >         upper_mnt = clone_private_mount(upperpath);
> > >         err = PTR_ERR(upper_mnt);
> > >         if (IS_ERR(upper_mnt)) {
> > >                 pr_err("failed to clone upperpath\n");
> > >                 goto out;
> > > so the upper path had been resolved...
> > > 
> > > OK, let's try to see what clone_private_mount() is unhappy about...
> > > Could you try the following on top of -next + braino fix and see
> > > what shows up?  Another interesting thing, assuming you can get
> > > to shell after overlayfs mount failure, would be /proc/self/mountinfo
> > > contents and stat(1) output for upper path of your overlayfs mount...
> > 
> > It looks like the mount never succeded in the first place? It doesn't
> > appear in /proc/self/mountinfo at all:
> > 
> > 2 2 0:2 / / rw - rootfs rootfs rw
> > 24 2 0:22 / /proc rw,relatime - proc proc rw
> > 25 2 0:23 / /sys rw,relatime - sysfs sys rw
> > 26 2 0:6 / /dev rw,relatime - devtmpfs dev rw,size=481992k,nr_inodes=120498,mode=755
> > 27 2 259:1 / /mnt/root-ro ro,relatime - squashfs /dev/nvme0n1 ro,errors=continue
> > 
> > I get the "kern_mount?" message.
> 
> What the... actually, the comment in front of that thing makes no
> sense whatsoever - it's *not* something kernel-internal; we get
> there for mounts that are absolute roots of some non-anonymous
> namespace; kernel-internal ones fail on if (!is_mounted(...))
> just above that.
> 
> OK, the comment came from db04662e2f4f "fs: allow detached mounts
> in clone_private_mount()" and it does point in an interesting
> direction - commit message there speaks of overlayfs and use of
> descriptors to specify layers.
> 
> Not that check_for_nsfs_mounts() (from the same commit) made any sense
> there - we don't *care* about anything mounted somewhere in that mount,
> since whatever's mounted on top of it does not follow into the copy
> (which is what has_locked_children() call is about - in effect, in copy
> you see all mountpoints that had been covered in the original)...
> 
> Oh, well - so we are seeing an absolute root of some non-anonymous
> namespace there.  Or a weird detached mount claimed to belong to
> some namespace, anyway.
> 
> Let's see if that's the way upperpath comes to be (and get a bit more
> information on that weird mount):
> 
> diff --git a/fs/namespace.c b/fs/namespace.c
> index eb990e9a668a..9b4c4afa2b29 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -2480,31 +2480,52 @@ struct vfsmount *clone_private_mount(const struct path *path)
>  
>  	guard(rwsem_read)(&namespace_sem);
>  
> -	if (IS_MNT_UNBINDABLE(old_mnt))
> +	if (IS_MNT_UNBINDABLE(old_mnt)) {
> +		pr_err("unbindable");
>  		return ERR_PTR(-EINVAL);
> +	}
>  
>  	if (mnt_has_parent(old_mnt)) {
> -		if (!check_mnt(old_mnt))
> +		if (!check_mnt(old_mnt)) {
> +			pr_err("mounted, but not in our namespace");
>  			return ERR_PTR(-EINVAL);
> +		}
>  	} else {
> -		if (!is_mounted(&old_mnt->mnt))
> +		if (!is_mounted(&old_mnt->mnt)) {
> +			pr_err("not mounted");
>  			return ERR_PTR(-EINVAL);
> +		}
>  
>  		/* Make sure this isn't something purely kernel internal. */
> -		if (!is_anon_ns(old_mnt->mnt_ns))
> +		if (!is_anon_ns(old_mnt->mnt_ns)) {
> +			if (old_mnt == old_mnt->mnt_ns->root)
> +				pr_err("absolute root");
> +			else
> +				pr_err("detached, but claimed to be in some ns");
> +			if (check_mnt(old_mnt))
> +				pr_err("our namespace, at that");
> +			else
> +				pr_err("some other non-anon namespace");
>  			return ERR_PTR(-EINVAL);
> +		}
>  
>  		/* Make sure we don't create mount namespace loops. */
> -		if (!check_for_nsfs_mounts(old_mnt))
> +		if (!check_for_nsfs_mounts(old_mnt)) {
> +			pr_err("shite with nsfs");
>  			return ERR_PTR(-EINVAL);
> +		}
>  	}
>  
> -	if (has_locked_children(old_mnt, path->dentry))
> +	if (has_locked_children(old_mnt, path->dentry)) {
> +		pr_err("has locked children");
>  		return ERR_PTR(-EINVAL);
> +	}
>  
>  	new_mnt = clone_mnt(old_mnt, path->dentry, CL_PRIVATE);
> -	if (IS_ERR(new_mnt))
> +	if (IS_ERR(new_mnt)) {
> +		pr_err("clone_mnt failed (%ld)", PTR_ERR(new_mnt));
>  		return ERR_PTR(-EINVAL);
> +	}
>  
>  	/* Longterm mount to be removed by kern_unmount*() */
>  	new_mnt->mnt_ns = MNT_NS_INTERNAL;

I then get:

[    0.881616] absolute root
[    0.881618] our namespace, at that

In btrfs_get_tree_subvol:

	ret = vfs_get_tree(dup_fc);
	if (!ret) {
		ret = btrfs_reconfigure_for_mount(dup_fc);
		up_write(&dup_fc->root->d_sb->s_umount);
	}
	if (!ret)
		mnt = vfs_create_mount(fc);
	else
		mnt = ERR_PTR(ret);
	put_fs_context(dup_fc);

Should it perhaps be:
		mnt = vfs_create_mount(dup_fc);

If I try that it works.