On 08/09/2025 11:36, John Ogness wrote: > There are still reasons why CLOCK_MONOTONIC_RAW might be > interesting. For example, if you want a very stable time source to > compare intervals, but do not care so much about the real world time > lengths of those intervals (i.e. where is the greatest latency vs. what > is the value of the greatest latency). Although even here, I doubt > CLOCK_MONOTONIC_RAW has a practical advantage over CLOCK_MONOTONIC. In fact, I'm just trying to compare the run-time of 2 minor variations of the same program (testing micro-optimizations). Absolute run-time is not really important, what I really want to know is: does v2 run faster or slower than v1? This is the framework I'm using at this point: #include <stdio.h> #include <time.h> extern void my_code(void); static long loop(int log2) { int n = 1 << log2; struct timespec t0, t1; clock_gettime(CLOCK_MONOTONIC, &t0); for (int i = 0; i < n; ++i) my_code(); clock_gettime(CLOCK_MONOTONIC, &t1); long d = (t1.tv_sec - t0.tv_sec)*1000000000L + (t1.tv_nsec - t0.tv_nsec); long t = d >> log2; return t; } int main(void) { long t, min = loop(4); for (int i = 0; i < 20; ++i) if ((t = loop(8)) < min) min = t; printf("MIN=%ld\n", min); return 0; } Basically: - warm up the caches - run my_code() 256 times && compute average run-time - repeat 20 times to find MINIMUM average run-time When my_code() is a trivial computational loop such as: mov $(1<<12), %eax 1: dec %ecx dec %ecx dec %eax jnz 1b ret Then running the benchmark 1000 times returns the same value 1000 times: MIN=2737 Obviously, the program I'm working on is a bit more complex, but barely: - no system calls, no library calls - just simple bit twiddling - tiny code, tiny data structures, everything fits in L1 $ size a.out text data bss dec hex filename 8549 632 1072 10253 280d a.out When I run the benchmark 1000 times, there are some large outliers: MIN_MIN=2502 MAX_MIN=2774 NOTE: 95% of the results are below 2536. But the top 1% (worst 10) are really bad (2646-2774) How to get repeatable results? Random 10% outliers break the ability to measure the impact of micro-optimizations expected to provide 0-3% improvements :( For reference, the script launching the benchmark does: echo -1 > /proc/sys/kernel/sched_rt_runtime_us for I in 0 1 2 3; do echo userspace > /sys/devices/system/cpu/cpu$I/cpufreq/scaling_governor; done sleep 0.25 for I in 0 1 2 3; do echo 3000000 > /sys/devices/system/cpu/cpu$I/cpufreq/scaling_setspeed; done sleep 0.25 for I in $(seq 1 1000); do chrt -f 99 taskset -c 2 ./a.out < $1 done for I in 0 1 2 3; do echo schedutil > /sys/devices/system/cpu/cpu$I/cpufreq/scaling_governor done echo 950000 > /proc/sys/kernel/sched_rt_runtime_us I've run out of ideas to identify other sources of variance. (I ran everything in single user mode with nothing else running.) Perhaps with perf I could identify the source of stalls or bubbles? Hoping someone can point me in the right direction. Regards