On 9/9/25 16:08, Marc Gonzalez wrote: > My hunch is frequency dropped to ~2.7 GHz for the duration of the benchmark. > > Maybe I should lower the frequency to 2 GHz. > But again, why would my code force throttling and not the toy code? > One possible reason is my code reaches 3.5 IPC, while the toy code > remains at 1.5 IPC (with micro-op fusion). > > It's a stretch, but easy to test. > > I will measure CPU cycles, to see if the increased run-time > corresponds with a change in CPU cycles. # dmesg | grep -i tsc [ 0.000000] tsc: Fast TSC calibration using PIT [ 0.000000] tsc: Detected 3292.563 MHz processor [ 0.038014] TSC deadline timer available [ 0.093396] clocksource: tsc-early: mask: 0xffffffffffffffff max_cycles: 0x2f75db292c8, max_idle_ns: 440795321216 ns [ 0.178415] clocksource: Switched to clocksource tsc-early [ 1.272272] tsc: Refined TSC clocksource calibration: 3292.377 MHz [ 1.272280] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x2f752b31fa5, max_idle_ns: 440795265170 ns [ 1.272305] clocksource: Switched to clocksource tsc $ lscpu | grep -o '[a-z_]*tsc[a-z_]*' tsc rdtscp constant_tsc nonstop_tsc tsc_deadline_timer tsc_adjust static void loop(int log2) { int n = 1 << log2; struct timespec t0, t1; clock_gettime(CLK, &t0); long c0 = __rdtsc(); for (int i = 0; i < n; ++i) my_code(); long c1 = __rdtsc(); clock_gettime(CLK, &t1); long d = (t1.tv_sec - t0.tv_sec)*1000000000L + (t1.tv_nsec - t0.tv_nsec); long t = d >> log2; long c = c1-c0; long f = c*1000000/d; printf("D=%ld C=%ld F=%ld T=%ld N=%d\n", d, c, f, t, n); } (Looks like RDTSCP might have been preferable over RDTSC) I lowered the frequency of all cores to 1.5 GHz MIN RUNTIME: D=327944847 C=1079700683 F=3292323 T=5004 N=65536 MAX RUNTIME: D=369923901 C=1217909284 F=3292323 T=5644 N=65536 Looks like the TSC ticks at 3.3 GHz (core nominal freq) even if I manually lower the core frequency. I need to figure out how 'perf stat' does it. Probably reads a different MSR. Changing /sys/devices/system/clocksource/clocksource0/current_clocksource to hpet doesn't really change anything. TODO: find cycle count MSR.