Re: ext4 v6.15-rc2 baseline

"Theodore Ts'o" <tytso@xxxxxxx> · Thu, 17 Apr 2025 22:56:23 -0500

On Thu, Apr 17, 2025 at 06:42:25PM -0700, Luis Chamberlain wrote:
> 
> ext4_defaults: 793 tests, 2 failures, 259 skipped, 10521 seconds
>   Failures: generic/223 generic/741

generic/223 is excluded in my tests.  From [1]:

// generic/223 tests file alignment, which works on ext4 only by
// accident because we're not RAID stripe aware yet, and works at all
// because we have bias towards aligning on power-of-two block numbers.
// It is a flaky test for some configurations, so skip it.
generic/223

[1] https://github.com/tytso/xfstests-bld/blob/master/test-appliance/files/root/fs/ext4/exclude

generic/741 looks like some kind of device-mapper setup problem.  From
741.out.bad:

device-mapper: remove ioctl on flakey-test  failed: No such device or address
Command failed.

There's nothing interesting in generic/741, but all I can tell you is,
"it works for me"(tm).

Ran: generic/741
Passed all 1 tests

> ext4_bigalloc16k_4k: 793 tests, 26 failures, 341 skipped, 8856 seconds
>   Failures: ext4/033 generic/075 generic/082 generic/091 generic/112
>     generic/127 generic/219 generic/223 generic/230 generic/231
>     generic/232 generic/233 generic/234 generic/235 generic/263
>     generic/280 generic/381 generic/382 generic/566 generic/587
>     generic/600 generic/601 generic/681 generic/682 generic/691
>     generic/741

Hmm, some of these are because there ar a bunch of tests that don't
work well the allocation cluster size != the file system block size.
See [2] for the tests that I exclude.  These are fundamentally test
bugs that just don't work for bigalloc's clustered allocation.

[2] https://github.com/tytso/xfstests-bld/blob/master/test-appliance/files/root/fs/ext4/cfg/bigalloc_4k.exclude

As far as the rest of the bigalloc failures, some of them is hard to
tell because you're not saving all of the test artifacts.  In
particular, the tests which run fsx create ${seq}.*.fsx{good,bad,log}
files.  My test appliance saves them, because they are super helpful
when debugging a test failure.  kdevops apparently doesn't.

What I do is save the entire results directory, although by default I
truncate any test artifacts from passing tests to 31k (this amount is
configurable via a command line option to gce-xfstests).  This is
important because some of artifact files are super verbose, and if you
save them all, the time to run xz on the tar file takes forever.  But
if the tests fail, they are *super* useful.

For the other bigalloc failures, I have a suspicion --- how big is the
TEST and SCRATCH devices that you are using?  By default, most of my
test scenarios use a "small" config which is 5G.  But for the bigalloc
tests, for the 4k block / 64k cluster size, the deviec needs to be at
least 20G or some of the tests will fail with ENOSPC.

Cheers,

						- Ted