Hi Shinichiro, ---- On Wed, 25 Jun 2025 20:20:50 +0800 Shinichiro Kawasaki <shinichiro.kawasaki@xxxxxxx> wrote --- > Cc+: Johaness, > > On Jun 24, 2025 / 20:10, Li Chen wrote: > > From: Li Chen <chenl311@xxxxxxxxxxxxxxx> > > > > The current hard-coded timeout for block/005 and 008 is 900 s. On large systems > > (e.g. 256 C) the test spawns one fio job per CPU and therefore > > issues 1 GiB of random I/O per job. Total workload scales linearly with > > the CPU-count, so the original 900 s window is often insufficient for > > high-core machines and causes false failures: > > > > fio did not finish after 900 seconds! > > > > To keep the logic simple while avoiding unnecessary test flakiness, bump > > the timeout to 1800 s whenever the system has more than 128 online CPUs. > > Smaller systems continue to use the original 900 s limit. > > Hello Li, thank you for the patch, and pointing out the problem. Your idea to > extend the timeout can be a solution. But I wonder we may need to extend the > timeout value again when we have more CPUs in the future. > > On the other hand, I can think of another idea. How about to cap the number of > jobs with a specific number? According to the blktests commit 8fc7ca8300cd > ("tests: use nproc to get number of CPUs for fio jobs"), the fio option > --numjobs="${nproc}" in _run_fio_rand_io() was introduced for the workloads > which "just want some IO". So, I think it is allowed to cap the numjobs with > some number, such as 128. Based on this idea, I created the patch below. Could > you try out if this approach avoids the problem on your system? > > > diff --git a/common/fio b/common/fio > index 91f4b23..f4965db 100644 > --- a/common/fio > +++ b/common/fio > @@ -204,10 +204,12 @@ _fio_opts_to_min_io() { > # Wrapper around _run_fio used if you need some I/O but don't really care much > # about the details > _run_fio_rand_io() { > - local bs > + local bs nr_jobs > > bs=$(_fio_opts_to_min_io "$@") || return 1 > - _run_fio --bs="$bs" --rw=randread --norandommap --numjobs="$(nproc)" \ > + nr_jobs=$(nproc) > + ((nr_jobs > 128)) && nr_jobs=128 > + _run_fio --bs="$bs" --rw=randread --norandommap --numjobs="$nr_jobs" \ > --name=reads --direct=1 "$@" Yes, this looks better. I'll try this patch when the machine is available, likely next week. Regards, Li