[bug] propagate_mount_busy() giving false negatives

Al Viro <viro@xxxxxxxxxxxxxxxxxx> · Mon, 9 Jun 2025 18:41:38 +0100

On mainline (both 6.16-rc1 and e.g. debian-testing 6.12.27 kernel):
; cat >a.sh <<'EOF'
P=/tmp/playground
mkdir $P
mount -t tmpfs none $P
mount --make-private $P
mkdir $P/A
mkdir $P/B
mount -t tmpfs none $P/A
mount --make-shared $P/A
mount --bind $P/A $P/B
mount --make-slave $P/B
mkdir $P/A/x
mount --bind $P/A $P/A/x
cd $P/B/x
mount --bind $P/A/x $P/B/x/x
umount $P/A/x
EOF
; . a.sh
; /bin/pwd
pwd: couldn't find directory entry in '..' with matching i-node
; pwd
/tmp/playground/B/x
; grep /tmp/playground /proc/self/mountinfo
36 30 0:28 / /tmp/playground rw,relatime - tmpfs none rw
37 36 0:30 / /tmp/playground/A rw,relatime shared:14 - tmpfs none rw
38 36 0:30 / /tmp/playground/B rw,relatime master:14 - tmpfs none rw
;

In other words, non-lazy umount of /tmp/playground/A/x has succeeded,
taking out /tmp/playground/B/x/x and /tmp/playground/B/x along with
it, with the latter being definitely in use.

What happens is that we have propagate_mount_busy() taking one look
at /tmp/playground/B/x and deciding that its use count doesn't matter,
since propagate_umount() won't take it out, since there's something
mounted on top of it.  And so there is - /tmp/playground/B/x/x, which
is taken out by the same propagate_umount(), so... the refcount on
/tmp/playground/B/x did matter, after all.

Goes back at least to 2014 (i.e. Eric's scalability work in that area),
but I wouldn't be surprised if it turns out to be as old as 2005...

Oh, well... at least with rewrite of propagate_umount it should be
reasonably easy to figure out the set of relevant mounts; in non-lazy case
we have only one mount in the original set, so all candidates will have
the same mountpoint dentry, which means that the subgraph consisting of
umount candidates will be a bunch of non-intersecting ancestry chains,
making the things much simpler than in generic case...

Still not fun.  And I very much doubt we can change the behaviour in
case where nothing is actually busy - without that cd(1) making the
sucker busy, we need umount(2) to take those 3 mounts out ;-/