Re: Large(ish) variance induced by SCHED_FIFO

Marc Gonzalez <marc.w.gonzalez@xxxxxxx> · Fri, 5 Sep 2025 15:23:09 +0200

Hello David :)

On 05/09/2025 10:25, Daniel Wagner wrote:

> On Fri, Sep 05, 2025 at 01:45:31AM +0200, Marc Gonzalez wrote:
>
>> Then I run this script as root:
>> #!/bin/bash
>>
>> CMD="taskset -c 1 ./a.out"
>> if [ "$1" = "fifo" ]; then CMD="chrt -f 99 $CMD"; fi
>>
>> for I in $(seq 1 30); do
>>   T0=$(date "+%s.%N")
>>   $CMD
>>   T1=$(date "+%s.%N")
>>   echo "$T1-$T0" | bc -l
>> done
> 
> This setup is forking processes in order to get timestamps. This
> gives you uncontrollable variance in the measurement. Use clock_gettime
> inside your test program.

I had already tested your suggestion, and observed it did not
make any difference. I will test again nonetheless.

(I used CLOCK_MONOTONIC_RAW.)

NB: if the large variance were induced by getting timestamps outside
the program, then SCHED_OTHER runs should be similarly impacted
(however, they are not).

> Also you might want to reduce the prio to 98.

May I ask why?

I'm guessing it's bad(TM) to potentially block the migration task?

I will lower the priority to 90.

Here is my current test code:

#include <stdio.h>
#include <time.h>
#define N (1 << 30)
#define G (1000*1000*1000L)
int main(void)
{
	struct timespec t0, t1;
	clock_gettime(CLOCK_MONOTONIC_RAW, &t0);
	for (volatile int i = 0; i < N; ++i);
	clock_gettime(CLOCK_MONOTONIC_RAW, &t1);
	long diff = (t1.tv_sec*G + t1.tv_nsec) - (t0.tv_sec*G + t0.tv_nsec);
	printf("%ld\n", diff);
	return 0;
}

associated script:

#!/bin/bash
CMD="taskset -c 1 ./a.out"
if [ "$1" = "fifo" ]; then CMD="chrt -f 90 $CMD"; fi
for I in $(seq 1 30); do $CMD; done

The results are the same as before:

SCHED_OTHER
MIN=2156085411 MAX=2156133420
All within 11 ppm of the average (EXCELLENT RESULTS)

SCHED_FIFO
MIN=2205859200 (first run again, how strange)
MAX=2304868902
All within 21950 ppm of the average (TERRIBLE RESULTS)
21950 ppm = 2.2%

Looking at the distribution of SCHED_FIFO again, I notice a pattern:
first run-time is the minimum outlier = 2.20 seconds
21 (again!!) results around 2.25 seconds
8 (again!!) results around 2.30 seconds

BRAIN DUMP:
"First" run is meaningless.
First since what? Previous batch of runs?
In that case, sleeping between runs should work around the issue?

It looks like there is some kind of 50 millisecond quantum
involved somewhere somehow.

$ grep HZ /boot/config-6.8.0-79-generic 
CONFIG_NO_HZ_COMMON=y
# CONFIG_HZ_PERIODIC is not set
# CONFIG_NO_HZ_IDLE is not set
CONFIG_NO_HZ_FULL=y
CONFIG_NO_HZ=y
# CONFIG_HZ_100 is not set
# CONFIG_HZ_250 is not set
# CONFIG_HZ_300 is not set
CONFIG_HZ_1000=y
CONFIG_HZ=1000

(Not sure what it means to have CONFIG_NO_HZ_FULL && CONFIG_HZ_1000)

This is driving me crazy.
I'm all ears if anyone has other suggestions.

Regards

Raw results for reference:

SCHED_OTHER
2156120840
2156110092
2156111812
2156117629
2156111235
2156104159
2156109850
2156120644
2156100701
2156085456
2156133420
2156130993
2156087623
2156120236
2156114137
2156099823
2156122243
2156095888
2156114046
2156102910
2156120532
2156102464
2156100703
2156125228
2156119672
2156110619
2156131331
2156085411
2156098780
2156121458

SCHED_FIFO
2205859200
2254828687
2254866765
2304868902
2254849969
2253908545
2254811821
2304845457
2253814019
2254834900
2304858026
2254845905
2253835350
2254815775
2304860984
2253824572
2254816525
2254800880
2303875283
2254826189
2254813961
2304850479
2254832102
2253816865
2254820633
2304847552
2253841768
2254807812
2254810662
2303848123