Re: Unexplained variance in run-time of trivial program

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 09/09/2025 13:21, Daniel Wagner wrote:

>> I note that IRQs 25 & 30 are still effectively pinned to CPU3,
>> despite the smp_aff setting.
>>
>> $ cat /proc/interrupts 
>>            CPU0       CPU1       CPU2       CPU3       
>>   0:         27          0          0          0   IO-APIC   2-edge      timer
>>   8:          0          0          0          0   IO-APIC   8-edge      rtc0
>>   9:          0          4          0          0   IO-APIC   9-fasteoi   acpi
>>  16:          0         25          0          0   IO-APIC  16-fasteoi   ehci_hcd:usb1
>>  18:          0          5          0          0   IO-APIC  18-fasteoi   i801_smbus
>>  23:          0          0         29          0   IO-APIC  23-fasteoi   ehci_hcd:usb2
>>  24:          0          0          0          0  PCI-MSI-0000:00:1c.0   0-edge      PCIe PME, pciehp
>>  25:          0          0          0          0  PCI-MSI-0000:00:1c.3   0-edge      PCIe PME
>>  26:      23328          0          0      37250  PCI-MSI-0000:00:1f.2   0-edge      ahci[0000:00:1f.2]
>>  27:      86091          0          0          0  PCI-MSI-0000:00:14.0   0-edge      xhci_hcd
>>  28:          0          0      51308          0  PCI-MSIX-0000:02:00.0  0-edge      enp2s0
>>  29:          0          0         22          0  PCI-MSI-0000:00:16.0   0-edge      mei_me
>>  30:          0          0          0        604  PCI-MSI-0000:00:1b.0   0-edge      snd_hda_intel:card0
>>  31:     198664          0          0          0  PCI-MSI-0000:00:02.0   0-edge      i915
>>  32:          0        631          0          0  PCI-MSI-0000:00:03.0   0-edge      snd_hda_intel:card1
> 
> Many drivers are not isolcpus aware. At least the sound driver could be
> unloaded for your test I suppose.

I don't quite understand.
Do these drivers explicitly request that their ISR run on CPU3?
Why doesn't the kernel just run these ISRs on a non-isolated core?

>> Almost exactly 20 ms in excess.
>> Could this be a hint?
>> But there is absolutely NOTHING traced between 4606.628019 & 4606.811832.
> 
> Ensure your clock source is working correctly and...
> 
>> I guess either my time source is incorrect.
>> (Next slide in Frederic's guide)
>> OR there is something wonky going on inside CPU3.
> 
> ...there is no SMI running on this CPU, and no power management running
> (also in the BIOS settings)

Actually, I'm 99.9% sure that clock source accuracy & SMM are red herrings.

If I replace my code with this one:

	mov $(1<<12), %eax
1:	dec %ecx
	dec %ecx
	dec %eax
	jnz 1b

which runs in ~2735 nanoseconds at 3 Ghz
(2 cycles per iteration * 4096 iterations = 8192 cycles
8192 cycles / 3 = 2730 nanoseconds)

Running this trivial baseline benchmark 2^16 times should take
2735 ns * 2^16 = 179.241 ms

And if I run the loop 1000 times, and sort by run-time, I observe
MIN=179.374211 ms
MAX=179.406745 ms
So worst-case is only 166 microseconds worse than expected
(contrast this to 20 MILLIseconds for my code, 120 times worse)

It seems something is randomly stalling the pipeline on CPU3.
This feels like thermal throttling maybe?
Is that supposed to be logged somewhere?
But why would my program throttle & not the trivial baseline benchmark?

Regards





[Index of Archives]     [RT Stable]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]

  Powered by Linux