On Tue, Apr 15, 2025 at 3:05 AM Sean Christopherson <seanjc@xxxxxxxxxx> wrote: > > On Mon, Apr 14, 2025, Uros Bizjak wrote: > > Micro-optimize vmx_do_interrupt_irqoff() by substituting > > MOV %RBP,%RSP; POP %RBP instruction sequence with equivalent > > LEAVE instruction. GCC compiler does this by default for > > a generic tuning and for all modern processors: > > Out of curisoity, is LEAVE actually a performance win, or is the benefit essentially > just the few code bytes saves? It is hard to say for out-of-order execution cores, especially when the stack engine is thrown to the mix (these two instructions, plus following RET, all update %rsp). The pragmatic solution was to do what the compiler does and use the compiler's choice, based on the tuning below. > > DEF_TUNE (X86_TUNE_USE_LEAVE, "use_leave", > > m_386 | m_CORE_ALL | m_K6_GEODE | m_AMD_MULTIPLE | m_ZHAOXIN > > | m_TREMONT | m_CORE_HYBRID | m_CORE_ATOM | m_GENERIC) The tuning is updated when a new target is introduced to the compiler and is based on various measurements by the processor manufacturer. The above covers the majority of recent processors (plus generic tuning), so I guess we won't fail by following the suit. OTOH, any performance difference will be negligible. > > The new code also saves a couple of bytes, from: > > > > 27: 48 89 ec mov %rbp,%rsp > > 2a: 5d pop %rbp > > > > to: > > > > 27: c9 leave Thanks, Uros.