Cc+: Johaness, On Jun 24, 2025 / 20:10, Li Chen wrote: > From: Li Chen <chenl311@xxxxxxxxxxxxxxx> > > The current hard-coded timeout for block/005 and 008 is 900 s. On large systems > (e.g. 256 C) the test spawns one fio job per CPU and therefore > issues 1 GiB of random I/O per job. Total workload scales linearly with > the CPU-count, so the original 900 s window is often insufficient for > high-core machines and causes false failures: > > fio did not finish after 900 seconds! > > To keep the logic simple while avoiding unnecessary test flakiness, bump > the timeout to 1800 s whenever the system has more than 128 online CPUs. > Smaller systems continue to use the original 900 s limit. Hello Li, thank you for the patch, and pointing out the problem. Your idea to extend the timeout can be a solution. But I wonder we may need to extend the timeout value again when we have more CPUs in the future. On the other hand, I can think of another idea. How about to cap the number of jobs with a specific number? According to the blktests commit 8fc7ca8300cd ("tests: use nproc to get number of CPUs for fio jobs"), the fio option --numjobs="${nproc}" in _run_fio_rand_io() was introduced for the workloads which "just want some IO". So, I think it is allowed to cap the numjobs with some number, such as 128. Based on this idea, I created the patch below. Could you try out if this approach avoids the problem on your system? diff --git a/common/fio b/common/fio index 91f4b23..f4965db 100644 --- a/common/fio +++ b/common/fio @@ -204,10 +204,12 @@ _fio_opts_to_min_io() { # Wrapper around _run_fio used if you need some I/O but don't really care much # about the details _run_fio_rand_io() { - local bs + local bs nr_jobs bs=$(_fio_opts_to_min_io "$@") || return 1 - _run_fio --bs="$bs" --rw=randread --norandommap --numjobs="$(nproc)" \ + nr_jobs=$(nproc) + ((nr_jobs > 128)) && nr_jobs=128 + _run_fio --bs="$bs" --rw=randread --norandommap --numjobs="$nr_jobs" \ --name=reads --direct=1 "$@" }