On 9/12/25 09:44, Daniel Wagner wrote: > On Fri, Sep 12, 2025 at 12:15:12AM +0200, Marc Gonzalez wrote: > >> - Not sure what the int_misc.recovery_* events measure??? >> They're documented as: >> "This event counts the number of cycles spent waiting for a >> recovery after an event such as a processor nuke, JEClear, assist, >> hle/rtm abort etc." >> and >> "Core cycles the allocator was stalled due to recovery from earlier >> clear event for any thread running on the physical core (e.g. >> misprediction or memory nuke)." >> => In my case, they're probably measuring the same thing. >> Weird that the description sounds a bit different. >> Need to read up on processor nuke, memory nuke, machine clear event, >> JEClear, assist, HLE/RTM... >> >> Daniel, seems you were spot on when mentioning side channel attacks. > > I'm sorry if I sounded patronizing, that was not my intent. After we > ruled out the OS noise the only thing left was that the runtime variance > is from the CPU itself. I appreciate all the insights you've shared this far! :) (As well as that from other participants!) >> https://www.usenix.org/system/files/sec21-ragab.pdf >> >> I need to truly read & understand this paper. > > Skimmed over it, it could explain what you are observing. The question > obviously is which part of the 'bad' code is confusing the CPU :) GOOD RUN D,C,Cb,F,T,N 1310881180 4315846773 4315846665 3292324 5000 262144 6808319919 inst_retired.any 7513464787 uops_executed.core 30496248 uops_executed.stall_cycles 164691876 uops_retired.stall_cycles 6292081 int_misc.recovery_cycles 292758 machine_clears.count 292049 machine_clears.memory_ordering BAD RUN D,C,Cb,F,T,N 1417213267 4665926496 4665926108 3292324 5406 262144 6808316250 inst_retired.any 7614270474 uops_executed.core 85148157 uops_executed.stall_cycles 285284384 uops_retired.stall_cycles 13520285 int_misc.recovery_cycles 1536963 machine_clears.count 1536308 machine_clears.memory_ordering Notes: - A "good" run gets 300k machine clears. - A "bad" run gets 1.5M machine clears. - All machine clears are memory ordering machine clears. No self-modifying code, no floating point code. Probably a few MCs from page faults. (TODO: measure only MCs from my code, not setup code) If I understand correctly, a memory ordering machine clear only happens when different cores access the same memory? I'm running single-threaded code on an otherwise idle system. I'm thinking that Haswell might lump Memory Ordering AND Memory Disambiguation MCs? Is that a reasonable assumption? Not sure what's non-deterministic about the loads in my code. Might be some kind of "catastrophic" chain of events where one mispredict generates a cascade of subsequent mispredicts. I will keep digging :) Regards