Re: Unexplained variance in run-time of trivial program

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 9/9/25 16:08, Marc Gonzalez wrote:

> My hunch is frequency dropped to ~2.7 GHz for the duration of the benchmark.
> 
> Maybe I should lower the frequency to 2 GHz.
> But again, why would my code force throttling and not the toy code?
> One possible reason is my code reaches 3.5 IPC, while the toy code
> remains at 1.5 IPC (with micro-op fusion).
> 
> It's a stretch, but easy to test.
> 
> I will measure CPU cycles, to see if the increased run-time
> corresponds with a change in CPU cycles.

# dmesg | grep -i tsc
[    0.000000] tsc: Fast TSC calibration using PIT
[    0.000000] tsc: Detected 3292.563 MHz processor
[    0.038014] TSC deadline timer available
[    0.093396] clocksource: tsc-early: mask: 0xffffffffffffffff max_cycles: 0x2f75db292c8, max_idle_ns: 440795321216 ns
[    0.178415] clocksource: Switched to clocksource tsc-early
[    1.272272] tsc: Refined TSC clocksource calibration: 3292.377 MHz
[    1.272280] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x2f752b31fa5, max_idle_ns: 440795265170 ns
[    1.272305] clocksource: Switched to clocksource tsc

$ lscpu | grep -o '[a-z_]*tsc[a-z_]*'
tsc
rdtscp
constant_tsc
nonstop_tsc
tsc_deadline_timer
tsc_adjust

static void loop(int log2)
{
	int n = 1 << log2;
	struct timespec t0, t1;
	clock_gettime(CLK, &t0);
	long c0 = __rdtsc();
	for (int i = 0; i < n; ++i) my_code();
	long c1 = __rdtsc();
	clock_gettime(CLK, &t1);
	long d = (t1.tv_sec - t0.tv_sec)*1000000000L + (t1.tv_nsec - t0.tv_nsec);
	long t = d >> log2;
	long c = c1-c0;
	long f = c*1000000/d;
	printf("D=%ld C=%ld F=%ld T=%ld N=%d\n", d, c, f, t, n);
}

(Looks like RDTSCP might have been preferable over RDTSC)

I lowered the frequency of all cores to 1.5 GHz

MIN RUNTIME: D=327944847 C=1079700683 F=3292323 T=5004 N=65536
MAX RUNTIME: D=369923901 C=1217909284 F=3292323 T=5644 N=65536

Looks like the TSC ticks at 3.3 GHz (core nominal freq) even
if I manually lower the core frequency. I need to figure out
how 'perf stat' does it. Probably reads a different MSR.

Changing /sys/devices/system/clocksource/clocksource0/current_clocksource
to hpet doesn't really change anything.

TODO: find cycle count MSR.





[Index of Archives]     [RT Stable]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]

  Powered by Linux