On 5/9/25 18:59, Pavel Tikhomirov wrote:
On 5/9/25 16:28, Al Viro wrote:
On Fri, May 09, 2025 at 09:26:28AM +0100, Al Viro wrote:
AFAICS, 9ffb14ef61ba "move_mount: allow to add a mount into an existing
group" breaks assertions on ->mnt_share/->mnt_slave. For once, the data
structures in question are actually documented.
Documentation/filesystem/sharedsubtree.rst:
All vfsmounts in a peer group have the same ->mnt_master.
If it is
non-NULL, they form a contiguous (ordered) segment of slave list.
fs/pnode.c:
* Note that peer groups form contiguous segments of slave lists.
fs/namespace.c:do_set_group():
if (IS_MNT_SLAVE(from)) {
struct mount *m = from->mnt_master;
list_add(&to->mnt_slave, &m->mnt_slave_list);
to->mnt_master = m;
}
if (IS_MNT_SHARED(from)) {
to->mnt_group_id = from->mnt_group_id;
list_add(&to->mnt_share, &from->mnt_share);
lock_mount_hash();
set_mnt_shared(to);
unlock_mount_hash();
}
Note that 'to' goes right after 'from' in ->mnt_share (i.e. peer group
list) and into the beginning of the slave list 'from' belongs to. IOW,
contiguity gets broken if 'from' is both IS_MNT_SLAVE and IS_MNT_SHARED.
Which is what happens when the peer group 'from' is in gets propagation
from somewhere.
Agreed, list ordering consistency looks broken by my commit.
It's not hard to fix - something like
if (IS_MNT_SHARED(from)) {
to->mnt_group_id = from->mnt_group_id;
list_add(&to->mnt_share, &from->mnt_share);
if (IS_MNT_SLAVE(from))
list_add(&to->mnt_slave, &from->mnt_slave);
to->mnt_master = from->mnt_master;
lock_mount_hash();
set_mnt_shared(to);
unlock_mount_hash();
} else if (IS_MNT_SLAVE(from)) {
to->mnt_master = from->mnt_master;
list_add(&to->mnt_slave, &from->mnt_master->mnt_slave_list);
}
ought to do it.
Yes it should work.
In case (IS_MNT_SLAVE(from) && !IS_MNT_SHARED(from)) we can probably
also do:
list_add(&to->mnt_slave, &from->mnt_slave);
as next slave after "from" is definitely not from the same shared group
with "from" (as it's not in a shared group) so we won't break list
continuity.
That will allow to simplify code change to:
if (IS_MNT_SLAVE(from)) {
struct mount *m = from->mnt_master;
- list_add(&to->mnt_slave, &m->mnt_slave_list);
+ list_add(&to->mnt_slave, &from->mnt_slave);
to->mnt_master = m;
}
if (IS_MNT_SHARED(from)) {
to->mnt_group_id = from->mnt_group_id;
list_add(&to->mnt_share, &from->mnt_share);
lock_mount_hash();
set_mnt_shared(to);
unlock_mount_hash();
}
If I'm not missing something (didn't test yet).
I'm nowhere near sufficiently awake right now to put
together a regression test, but unless I'm missing something subtle, it
should be possible to get a fairly obvious breakage of propagate_mnt()
out of that...
I managed to see weird behavior like that:
# rmdir /tmp/{A,B,C,D,E,Z}
# unshare -m
mkdir /tmp/{A,B,C,D,E,Z}
mount --make-rprivate /
mount -t tmpfs tmpfs /tmp/A
mount --bind /tmp/A /tmp/Z
mount --make-shared /tmp/A
mount --bind /tmp/A /tmp/B
mount --make-slave /tmp/B
mount --make-shared /tmp/B
mount --bind /tmp/B /tmp/C
mount --bind /tmp/C /tmp/D
mount --bind /tmp/D /tmp/E
./setgroup-v2 /tmp/C /tmp/Z
Note I put ./setgroup-v2 code here:
https://github.com/Snorch/linux-helpers/blob/master/mount_set_group.c
mkdir /tmp/A/subdir
mount -t tmpfs tmpfs /tmp/A/subdir
This creates 16 subdir mounts instead of expected 6:
cat /proc/self/mountinfo | grep /tmp/
1071 1065 0:109 / /tmp/A rw,relatime shared:556 - tmpfs tmpfs
rw,seclabel,inode64
1073 1065 0:109 / /tmp/Z rw,relatime shared:1040 master:556 - tmpfs
tmpfs rw,seclabel,inode64
1076 1065 0:109 / /tmp/B rw,relatime shared:1040 master:556 - tmpfs
tmpfs rw,seclabel,inode64
1077 1065 0:109 / /tmp/C rw,relatime shared:1040 master:556 - tmpfs
tmpfs rw,seclabel,inode64
1078 1065 0:109 / /tmp/D rw,relatime shared:1040 master:556 - tmpfs
tmpfs rw,seclabel,inode64
1079 1065 0:109 / /tmp/E rw,relatime shared:1040 master:556 - tmpfs
tmpfs rw,seclabel,inode64
1080 1071 0:136 / /tmp/A/subdir rw,relatime shared:1041 - tmpfs tmpfs
rw,seclabel,inode64
1081 1073 0:136 / /tmp/Z/subdir rw,relatime shared:1042 master:1041 -
tmpfs tmpfs rw,seclabel,inode64
1082 1078 0:136 / /tmp/D/subdir rw,relatime shared:1042 master:1041 -
tmpfs tmpfs rw,seclabel,inode64
1083 1079 0:136 / /tmp/E/subdir rw,relatime shared:1042 master:1041 -
tmpfs tmpfs rw,seclabel,inode64
1084 1076 0:136 / /tmp/B/subdir rw,relatime shared:1042 master:1041 -
tmpfs tmpfs rw,seclabel,inode64
1085 1077 0:136 / /tmp/C/subdir rw,relatime shared:1042 master:1041 -
tmpfs tmpfs rw,seclabel,inode64
1086 1084 0:136 / /tmp/B/subdir rw,relatime shared:1042 master:1041 -
tmpfs tmpfs rw,seclabel,inode64
1087 1085 0:136 / /tmp/C/subdir rw,relatime shared:1042 master:1041 -
tmpfs tmpfs rw,seclabel,inode64
1088 1081 0:136 / /tmp/Z/subdir rw,relatime shared:1042 master:1041 -
tmpfs tmpfs rw,seclabel,inode64
1089 1082 0:136 / /tmp/D/subdir rw,relatime shared:1042 master:1041 -
tmpfs tmpfs rw,seclabel,inode64
1090 1083 0:136 / /tmp/E/subdir rw,relatime shared:1042 master:1041 -
tmpfs tmpfs rw,seclabel,inode64
1142 1089 0:136 / /tmp/D/subdir rw,relatime shared:1042 master:1041 -
tmpfs tmpfs rw,seclabel,inode64
1143 1090 0:136 / /tmp/E/subdir rw,relatime shared:1042 master:1041 -
tmpfs tmpfs rw,seclabel,inode64
1144 1086 0:136 / /tmp/B/subdir rw,relatime shared:1042 master:1041 -
tmpfs tmpfs rw,seclabel,inode64
1145 1087 0:136 / /tmp/C/subdir rw,relatime shared:1042 master:1041 -
tmpfs tmpfs rw,seclabel,inode64
1146 1088 0:136 / /tmp/Z/subdir rw,relatime shared:1042 master:1041 -
tmpfs tmpfs rw,seclabel,inode64
Maybe that can be converted to a regression test.
Not sufficiently awake is right - wrong address on Cc... Anyway, bedtime
for me...
--
Best regards, Pavel Tikhomirov
Senior Software Developer, Virtuozzo.