Re: [PATCH blktests] block/005|008: double timeout on machines with more than 128 CPUs

Shinichiro Kawasaki <shinichiro.kawasaki@xxxxxxx> · Wed, 25 Jun 2025 12:20:50 +0000

Cc+: Johaness,

On Jun 24, 2025 / 20:10, Li Chen wrote:
> From: Li Chen <chenl311@xxxxxxxxxxxxxxx>
> 
> The current hard-coded timeout for block/005 and 008 is 900 s.  On large systems
> (e.g. 256 C) the test spawns one fio job per CPU and therefore
> issues 1 GiB of random I/O per job.  Total workload scales linearly with
> the CPU-count, so the original 900 s window is often insufficient for
> high-core machines and causes false failures:
> 
>     fio did not finish after 900 seconds!
> 
> To keep the logic simple while avoiding unnecessary test flakiness, bump
> the timeout to 1800 s whenever the system has more than 128 online CPUs.
> Smaller systems continue to use the original 900 s limit.

Hello Li, thank you for the patch, and pointing out the problem. Your idea to
extend the timeout can be a solution. But I wonder we may need to extend the
timeout value again when we have more CPUs in the future.

On the other hand, I can think of another idea. How about to cap the number of
jobs with a specific number? According to the blktests commit 8fc7ca8300cd
("tests: use nproc to get number of CPUs for fio jobs"), the fio option
--numjobs="${nproc}" in _run_fio_rand_io() was introduced for the workloads
which "just want some IO". So, I think it is allowed to cap the numjobs with
some number, such as 128. Based on this idea, I created the patch below. Could
you try out if this approach avoids the problem on your system?

diff --git a/common/fio b/common/fio
index 91f4b23..f4965db 100644
--- a/common/fio
+++ b/common/fio
@@ -204,10 +204,12 @@ _fio_opts_to_min_io() {
 # Wrapper around _run_fio used if you need some I/O but don't really care much
 # about the details
 _run_fio_rand_io() {
-	local bs
+	local bs nr_jobs
 
 	bs=$(_fio_opts_to_min_io "$@") || return 1
-	_run_fio --bs="$bs" --rw=randread --norandommap --numjobs="$(nproc)" \
+	nr_jobs=$(nproc)
+	((nr_jobs > 128)) && nr_jobs=128
+	_run_fio --bs="$bs" --rw=randread --norandommap --numjobs="$nr_jobs" \
 		--name=reads --direct=1 "$@"
 }