Re: [PATCHES v3][RFC][CFT] mount-related stuff

Dave Chinner <david@xxxxxxxxxxxxx> · Thu, 4 Sep 2025 10:17:29 +1000

On Wed, Sep 03, 2025 at 07:14:29PM +0100, Al Viro wrote:
> On Wed, Sep 03, 2025 at 07:47:18AM -0700, Linus Torvalds wrote:
> > On Tue, 2 Sept 2025 at 21:54, Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote:
> > >
> > > If nobody objects, this goes into #for-next.
> > 
> > Looks all sane to me.
> > 
> > What was the issue with generic/475? I have missed that context..
> 
> At some point testing that branch has caught a failure in generic/475.
> Unfortunately, it wouldn't trigger on every run, so there was
> a possibility that it started earlier.  
> 
> When I went digging, I've found it with trixie kernel (6.12.38 in
> that kvm, at the time) rebuilt with my local config; the one used
> by debian didn't trigger that.  Bisection by config converged to
> PREEMPT_VOLUNTARY (no visible failures) changed to PREEMPT (failures
> happen with odds a bit below 10%).
> 
> There are several failure modes; the most common is something like
> ...
> echo '1' 2>&1 > /sys/fs/xfs/dm-0/error/fail_at_unmount
> echo '0' 2>&1 > /sys/fs/xfs/dm-0/error/metadata/EIO/max_retries
> echo '0' 2>&1 > /sys/fs/xfs/dm-0/error/metadata/EIO/retry_timeout_seconds
> fsstress: check_cwd stat64() returned -1 with errno: 5 (Input/output error)
> fsstress: check_cwd failure
> fsstress: check_cwd stat64() returned -1 with errno: 5 (Input/output error)
> fsstress: check_cwd failure
> fsstress: check_cwd stat64() returned -1 with errno: 5 (Input/output error)
> fsstress: check_cwd failure
> fsstress: check_cwd stat64() returned -1 with errno: 5 (Input/output error)
> fsstress: check_cwd failure
> fsstress killed (pid 10824)
> fsstress killed (pid 10826)
> fsstress killed (pid 10827)
> fsstress killed (pid 10828)
> fsstress killed (pid 10829)
> umount: /home/scratch: target is busy.
> unmount failed
> umount: /home/scratch: target is busy.
> umount: /dev/sdb2: not mounted.
> 
> in the end of output (that's mainline v6.12); other variants include e.g.
> quietly hanging udevadm wait (killable). 

Huh. I've been seeing that "udevadm wait hang" on DM devices issue
for a while now when using my check-parallel variant of fstests.

It sometimes reproduces every run, so it can be under 5 minutes to
reproduce on a 64-way concurrent test run.  It also affects most of
the tests that use DM devices (which all call udevadm wait), not
just generic/475.  Running 'pkill udevadm' is usually enough to get
everything unstuck and then the tests continue running.

However, I haven't been able to isolate the problem as running the
tests single threaded (ie. normal fstests behaviour) never
reproduced it, which is bloody annoying....

> It's bloody annoying to bisect -
> 100-iterations run takes about 2.5 hours and while usually a failure happens
> in the first 40 minutes or so or not at all...

That seems to be the case for me, too. If the default XFS config
completes (~8 minutes for auto group), then the rest of the configs
also complete (~2 hours for a dozen different mkfs/mount configs
to run through auto group tests).

> PREEMPT definitely is the main contributor to the failure odds...

My test kernels are built with PREEMPT enabled, so it may very
likely be a contributing factor:

CONFIG_PREEMPT_BUILD=y
CONFIG_ARCH_HAS_PREEMPT_LAZY=y
# CONFIG_PREEMPT_NONE is not set
# CONFIG_PREEMPT_VOLUNTARY is not set
CONFIG_PREEMPT=y
# CONFIG_PREEMPT_LAZY is not set
# CONFIG_PREEMPT_RT is not set
CONFIG_PREEMPT_COUNT=y
CONFIG_PREEMPTION=y
CONFIG_PREEMPT_DYNAMIC=y

> I'm doing
> a bisection between v6.12 and v6.10 at the moment, will post when I get
> something more useful...

check-parallel is relatively new so, unfortunately, I don't have any
idea when this behaviour might have been introduced.

FWIW, 'udevadm wait' is relatively new behaviour for both udev and
fstests. It was introduced into fstests for check-parallel to
replace 'udevadm settle'. i.e. wait for the specific device to
change to a particular state rather than waiting for the entire udev
queue to drain.  Check-parallel uses hundreds of block devices and
filesystems at the same time resulting in multiple mount/unmount
occurring every second. Hence waiting on the udev queue to drain
can take a -long- time, but maybe waiting on the device node state
chang itself is racy (i.e. might be a udevadm or DM bug) and PREEMPT
is opening up that window.

-Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx