Re: [PATCH blktests] block/005|008: double timeout on machines with more than 128 CPUs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Shinichiro,

 ---- On Wed, 25 Jun 2025 20:20:50 +0800  Shinichiro Kawasaki <shinichiro.kawasaki@xxxxxxx> wrote --- 
 > Cc+: Johaness,
 > 
 > On Jun 24, 2025 / 20:10, Li Chen wrote:
 > > From: Li Chen <chenl311@xxxxxxxxxxxxxxx>
 > > 
 > > The current hard-coded timeout for block/005 and 008 is 900 s.  On large systems
 > > (e.g. 256 C) the test spawns one fio job per CPU and therefore
 > > issues 1 GiB of random I/O per job.  Total workload scales linearly with
 > > the CPU-count, so the original 900 s window is often insufficient for
 > > high-core machines and causes false failures:
 > > 
 > >     fio did not finish after 900 seconds!
 > > 
 > > To keep the logic simple while avoiding unnecessary test flakiness, bump
 > > the timeout to 1800 s whenever the system has more than 128 online CPUs.
 > > Smaller systems continue to use the original 900 s limit.
 > 
 > Hello Li, thank you for the patch, and pointing out the problem. Your idea to
 > extend the timeout can be a solution. But I wonder we may need to extend the
 > timeout value again when we have more CPUs in the future.
 > 
 > On the other hand, I can think of another idea. How about to cap the number of
 > jobs with a specific number? According to the blktests commit 8fc7ca8300cd
 > ("tests: use nproc to get number of CPUs for fio jobs"), the fio option
 > --numjobs="${nproc}" in _run_fio_rand_io() was introduced for the workloads
 > which "just want some IO". So, I think it is allowed to cap the numjobs with
 > some number, such as 128. Based on this idea, I created the patch below. Could
 > you try out if this approach avoids the problem on your system?
 > 
 > 
 > diff --git a/common/fio b/common/fio
 > index 91f4b23..f4965db 100644
 > --- a/common/fio
 > +++ b/common/fio
 > @@ -204,10 +204,12 @@ _fio_opts_to_min_io() {
 >  # Wrapper around _run_fio used if you need some I/O but don't really care much
 >  # about the details
 >  _run_fio_rand_io() {
 > -    local bs
 > +    local bs nr_jobs
 >  
 >      bs=$(_fio_opts_to_min_io "$@") || return 1
 > -    _run_fio --bs="$bs" --rw=randread --norandommap --numjobs="$(nproc)" \
 > +    nr_jobs=$(nproc)
 > +    ((nr_jobs > 128)) && nr_jobs=128
 > +    _run_fio --bs="$bs" --rw=randread --norandommap --numjobs="$nr_jobs" \
 >          --name=reads --direct=1 "$@"

Yes, this looks better. I'll try this patch when the machine is available, likely next week.

Regards,
Li




[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux