[PATCH 07/28] check-parallel: adjust concurrency according to CPU count

Dave Chinner <david@xxxxxxxxxxxxx> · Thu, 17 Apr 2025 13:00:48 +1000

From: Dave Chinner <dchinner@xxxxxxxxxx>

Concurrency is currently hard coded at 64 worker threads. This is
too many for small CPU count machines; the idea is to create a
sustained load of roughly one test per CPU as they are mostly single
threaded/single process tests. The number "64" was chosen because
I've been developing this functionality on a 64p VM.

Rather than hard coding the concurrency, probe the number of CPUs
available and create that many running contexts as the default
concurrency to use.

Further, add a CLI option to specify the number of threads to run so
that we can over- or under-commit the CPU resources to enable direct
benchmarking of performance with different levels of concurrency.

Let's use that capability to show how much check-parallel can
benefit small systems. Using a single check execution thread for all
tests inside a 4p control group to limit maximum CPU usage to the
equivalent of a small 4p machine:

$ time sudo numactl -C 4-7 ./check-parallel -D /mnt/xfs -t 1 -g quick -s xfs -x dump -X generic/531
Runner 0 Failures:  generic/504
Tests run: 921
Tests _notrun: 272
Failure count: 2
.....

real    61m31.362s
user    0m0.029s
sys     0m0.059s

the quick group on XFS takes *over an hour* to run.

If we use the same 4p control group setup and run with 8 test
execution threads to ensure the 4 CPUs are fully utilised for most
of the test run:

$ time sudo numactl -C 4-7 ./check-parallel -D /mnt/xfs -t 8 -g quick -s xfs -x dump -X generic/531
Runner 7 Failures:  generic/504
Tests run: 921
Tests _notrun: 145
Failure count: 1
.....

real    17m33.124s
user    0m0.009s
sys     0m0.017s

The same test run takes only 17m33s. The same number of tests were
run, the same failures occurred. [ Ignore the differences in
notrun/failure count - the multi-file aggregation currently doesn't
work correctly for the single log file case. ]

That's a reduction in test runtime of ~72% for a 4 CPU system. Or,
if we want to measure it the other way, we get a ~3.5x improvement
in runtime scalability. i.e. going from 1 -> 4 CPUs being used for
test execution (4x increase) we get a 3.5x improvement in
scalability when we go from check to check-parallel.

Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
---
 check-parallel | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/check-parallel b/check-parallel
index cb5d6aedf..0649a417f 100755
--- a/check-parallel
+++ b/check-parallel
@@ -10,7 +10,7 @@
 # the loop devices.
 
 basedir=""
-runners=64
+runners=$(getconf _NPROCESSORS_CONF)
 runner_list=()
 runtimes=()
 show_test_list=
@@ -30,6 +30,7 @@ usage()
 
 check options
     -D <dir>		Directory to run in
+    -t <n>		Number of concurrent tests to  run
     -n			Output test list, do not run tests
     -r			randomize test order
     --exact-order	run tests in the exact order specified
@@ -81,6 +82,7 @@ while [ $# -gt 0 ]; do
 	-\? | -h | --help) usage ;;
 
 	-D)	basedir=$2; shift ;;
+	-t)	runners=$2; shift ;;
 	-g)	_tl_setup_group $2 ; shift ;;
 	-e)	_tl_setup_exclude_tests $2 ; shift ;;
 	-E)	_tl_setup_exclude_file $2 ; shift ;;
@@ -111,6 +113,11 @@ if [ ! -d "$basedir" ]; then
 	echo "Invalid basedir specification"
 	usage
 fi
+if [[ $runners -le 0 || $runners -gt 1024 ]]; then
+	echo "Invalid thread specificaton: $runners"
+	usage
+fi
+
 if [ -d "$basedir/runner-0/" ]; then
 	prev_results=`ls -tr $basedir/runner-0/ | grep results | tail -1`
 fi
-- 
2.45.2