RE: [RFC] ceph: strange mount/unmount behavior

Viacheslav Dubeyko <Slava.Dubeyko@xxxxxxx> · Tue, 26 Aug 2025 18:58:58 +0000

On Tue, 2025-08-26 at 11:10 +0200, Christian Brauner wrote:
> On Mon, Aug 25, 2025 at 09:53:48PM +0000, Viacheslav Dubeyko wrote:
> > Hello,
> > 
> > I am investigating an issue with generic/604:
> > 
> > sudo ./check generic/604
> > FSTYP         -- ceph
> > PLATFORM      -- Linux/x86_64 ceph-0005 6.17.0-rc1+ #29 SMP PREEMPT_DYNAMIC Mon
> > Aug 25 13:06:10 PDT 2025
> > MKFS_OPTIONS  -- 192.168.1.213:6789:/scratch
> > MOUNT_OPTIONS -- -o name=admin 192.168.1.213:6789:/scratch /mnt/cephfs/scratch
> > 
> > generic/604 10s ... - output mismatch (see
> > XFSTESTS/xfstestsdev/results//generic/604.out.bad)
> >     --- tests/generic/604.out	2025-02-25 13:05:32.515668548 -0800
> >     +++ XFSTESTS/xfstests-dev/results//generic/604.out.bad	2025-08-25
> > 14:25:49.256780397 -0700
> >     @@ -1,2 +1,3 @@
> >      QA output created by 604
> >     +umount: /mnt/cephfs/scratch: target is busy.
> >      Silence is golden
> >     ...
> >     (Run 'diff -u XFSTESTS/xfstests-dev/tests/generic/604.out XFSTESTS/xfstests-
> > dev/results//generic/604.out.bad'  to see the entire diff)
> > Ran: generic/604
> > Failures: generic/604
> > Failed 1 of 1 tests
> > 
> > As far as I can see, the generic/604 intentionally delays the unmount and mount
> > operation starts before unmount finish:
> > 
> > # For overlayfs, avoid unmounting the base fs after _scratch_mount tries to
> > # mount the base fs.  Delay the mount attempt by a small amount in the hope
> > # that the mount() call will try to lock s_umount /after/ umount has already
> > # taken it.
> > $UMOUNT_PROG $SCRATCH_MNT &
> > sleep 0.01s ; _scratch_mount
> > wait
> > 
> > As a result, we have this issue because a mnt_count is bigger than expected one
> > in propagate_mount_busy() [1]:
> > 
> > 	} else {
> > 		smp_mb(); // paired with __legitimize_mnt()
> > 		shrink_submounts(mnt);
> > 		retval = -EBUSY;
> > 		if (!propagate_mount_busy(mnt, 2)) {
> > 			umount_tree(mnt, UMOUNT_PROPAGATE|UMOUNT_SYNC);
> > 			retval = 0;
> > 		}
> > 	}
> > 
> > 
> > [   71.347372] pid 3762 do_umount():2022 finished:  mnt_get_count(mnt) 3
> > 
> > But if I am trying to understand what is going on during mount, then I can see
> > that I can mount the same file system instance multiple times even for the same
> > mount point:
> 
> The new mount api has always allowed for this whereas the old mount(2)
> api doesn't. There's no reason to not allow this.

OK. I see.

So, finally, the main problem of current generic/604 issue is not correct
interaction of mount and unmount logic of CephFS code. Somehow, the mount logic
makes the mnt_count is bigger than expected one for unmount path. What could be
a clean/correct solution of this issue from the VFS point of view?

Thanks,
Slava.