Re: Unexplained variance in run-time of trivial program

Marc Gonzalez <marc.w.gonzalez@xxxxxxx> · Sat, 13 Sep 2025 12:09:13 +0200

On 9/12/25 09:44, Daniel Wagner wrote:

> On Fri, Sep 12, 2025 at 12:15:12AM +0200, Marc Gonzalez wrote:
>
>> - Not sure what the int_misc.recovery_* events measure???
>> They're documented as:
>> "This event counts the number of cycles spent waiting for a
>> recovery after an event such as a processor nuke, JEClear, assist,
>> hle/rtm abort etc."
>> and
>> "Core cycles the allocator was stalled due to recovery from earlier
>> clear event for any thread running on the physical core (e.g.
>> misprediction or memory nuke)."
>> => In my case, they're probably measuring the same thing.
>> Weird that the description sounds a bit different.
>> Need to read up on processor nuke, memory nuke, machine clear event,
>> JEClear, assist, HLE/RTM...
>>
>> Daniel, seems you were spot on when mentioning side channel attacks.
> 
> I'm sorry if I sounded patronizing, that was not my intent. After we
> ruled out the OS noise the only thing left was that the runtime variance
> is from the CPU itself.

I appreciate all the insights you've shared this far! :)
(As well as that from other participants!)

>> https://www.usenix.org/system/files/sec21-ragab.pdf
>>
>> I need to truly read & understand this paper.
> 
> Skimmed over it, it could explain what you are observing. The question
> obviously is which part of the 'bad' code is confusing the CPU :)

GOOD RUN
D,C,Cb,F,T,N
1310881180 4315846773 4315846665 3292324 5000 262144
        6808319919      inst_retired.any                                                      
        7513464787      uops_executed.core                                                    
          30496248      uops_executed.stall_cycles                                            
         164691876      uops_retired.stall_cycles                                             
           6292081      int_misc.recovery_cycles                                              
            292758      machine_clears.count                                                  
            292049      machine_clears.memory_ordering                                        

BAD RUN
D,C,Cb,F,T,N
1417213267 4665926496 4665926108 3292324 5406 262144
        6808316250      inst_retired.any                                                      
        7614270474      uops_executed.core                                                    
          85148157      uops_executed.stall_cycles                                            
         285284384      uops_retired.stall_cycles                                             
          13520285      int_misc.recovery_cycles                                              
           1536963      machine_clears.count                                                  
           1536308      machine_clears.memory_ordering                                        

Notes:

- A "good" run gets 300k machine clears.
- A  "bad" run gets 1.5M machine clears.
- All machine clears are memory ordering machine clears.
No self-modifying code, no floating point code.
Probably a few MCs from page faults.
(TODO: measure only MCs from my code, not setup code)

If I understand correctly, a memory ordering machine clear
only happens when different cores access the same memory?
I'm running single-threaded code on an otherwise idle system.
I'm thinking that Haswell might lump Memory Ordering AND
Memory Disambiguation MCs?

Is that a reasonable assumption?

Not sure what's non-deterministic about the loads in my code.
Might be some kind of "catastrophic" chain of events where
one mispredict generates a cascade of subsequent mispredicts.

I will keep digging :)

Regards