On 09/09/2025 13:21, Daniel Wagner wrote: >> I note that IRQs 25 & 30 are still effectively pinned to CPU3, >> despite the smp_aff setting. >> >> $ cat /proc/interrupts >> CPU0 CPU1 CPU2 CPU3 >> 0: 27 0 0 0 IO-APIC 2-edge timer >> 8: 0 0 0 0 IO-APIC 8-edge rtc0 >> 9: 0 4 0 0 IO-APIC 9-fasteoi acpi >> 16: 0 25 0 0 IO-APIC 16-fasteoi ehci_hcd:usb1 >> 18: 0 5 0 0 IO-APIC 18-fasteoi i801_smbus >> 23: 0 0 29 0 IO-APIC 23-fasteoi ehci_hcd:usb2 >> 24: 0 0 0 0 PCI-MSI-0000:00:1c.0 0-edge PCIe PME, pciehp >> 25: 0 0 0 0 PCI-MSI-0000:00:1c.3 0-edge PCIe PME >> 26: 23328 0 0 37250 PCI-MSI-0000:00:1f.2 0-edge ahci[0000:00:1f.2] >> 27: 86091 0 0 0 PCI-MSI-0000:00:14.0 0-edge xhci_hcd >> 28: 0 0 51308 0 PCI-MSIX-0000:02:00.0 0-edge enp2s0 >> 29: 0 0 22 0 PCI-MSI-0000:00:16.0 0-edge mei_me >> 30: 0 0 0 604 PCI-MSI-0000:00:1b.0 0-edge snd_hda_intel:card0 >> 31: 198664 0 0 0 PCI-MSI-0000:00:02.0 0-edge i915 >> 32: 0 631 0 0 PCI-MSI-0000:00:03.0 0-edge snd_hda_intel:card1 > > Many drivers are not isolcpus aware. At least the sound driver could be > unloaded for your test I suppose. I don't quite understand. Do these drivers explicitly request that their ISR run on CPU3? Why doesn't the kernel just run these ISRs on a non-isolated core? >> Almost exactly 20 ms in excess. >> Could this be a hint? >> But there is absolutely NOTHING traced between 4606.628019 & 4606.811832. > > Ensure your clock source is working correctly and... > >> I guess either my time source is incorrect. >> (Next slide in Frederic's guide) >> OR there is something wonky going on inside CPU3. > > ...there is no SMI running on this CPU, and no power management running > (also in the BIOS settings) Actually, I'm 99.9% sure that clock source accuracy & SMM are red herrings. If I replace my code with this one: mov $(1<<12), %eax 1: dec %ecx dec %ecx dec %eax jnz 1b which runs in ~2735 nanoseconds at 3 Ghz (2 cycles per iteration * 4096 iterations = 8192 cycles 8192 cycles / 3 = 2730 nanoseconds) Running this trivial baseline benchmark 2^16 times should take 2735 ns * 2^16 = 179.241 ms And if I run the loop 1000 times, and sort by run-time, I observe MIN=179.374211 ms MAX=179.406745 ms So worst-case is only 166 microseconds worse than expected (contrast this to 20 MILLIseconds for my code, 120 times worse) It seems something is randomly stalling the pipeline on CPU3. This feels like thermal throttling maybe? Is that supposed to be logged somewhere? But why would my program throttle & not the trivial baseline benchmark? Regards