Hello Dave, On 8/7/2025 11:30 PM, Dave Hansen wrote: > On 8/7/25 09:59, Kim Phillips wrote: >> Add the new CPUID bit that indicates available hardware support: >> CPUID_Fn8000001F_EAX [AMD Secure Encryption EAX] bit 25. >> >> Indicate support for SEV_FEATURES bit 15 (SmtProtection) to be set by >> an SNP guest to enable the feature. > > It would be ideal to see an logical description of what "SmtProtection" > is and what it means for the kernel as opposed to referring to the > documentation and letting reviewers draw their own conclusions. I'll try to elaborate on the general idea of SMT Protection for SEV-SNP VM: The idea is when a vCPU is running (between VMRUN and VMEXIT), the sibling CPU must be idle - in HLT or C2 state. If the sibling is not idling in one of those state the VMRUN will immediately exit with the "VMEXIT_IDLE_REQUIRED" error code. Ideally, some layer in KVM / kernel has to coordinate the following: (I'm using thread_info flags for illustrative purposes) CPU0 (Running vCPU) CPU128 (SMT sibling) =================== ==================== /* VMRUN Path */ /* set_thread_flag(TIF_SVM_PROTECTED); * Core scheduling ensures this thread is retry: * force into an idle state. while (!(READ_ONCE(smt_ti->flags) & TIF_IDLING)) * XXX: Needs to only select HLT / C2 cpu_relax(); */ cpu_relax(); if (READ_ONCE(smt_ti->flags) & TIF_SVM_PROTECTED) cpu_relax(); force_hlt_or_c2() cpu_relax(); set_thread_flag(TIF_IDLING); /* Sees TIF_IDLING on SMT */ native_safe_halt(); VMRUN /* Success */ Here is a case where the VMRUN fails with "VMEXIT_IDLE_REQUIRED": CPU0 (Running vCPU) CPU128 (SMT sibling) =================== ==================== /* VMRUN Path */ /* set_thread_flag(TIF_SVM_PROTECTED); * Core scheduling ensures this thread is retry: * force into an idle state. while (!(READ_ONCE(smt_ti->flags) & TIF_IDLING)) * XXX: Needs to only select HLT / C2 cpu_relax(); */ cpu_relax(); if (READ_ONCE(smt_ti->flags) & TIF_SVM_PROTECTED) cpu_relax(); force_hlt_or_c2() cpu_relax(); set_thread_flag(TIF_IDLING); /* Sees TIF_IDLING on SMT */ native_safe_halt() ... /* Interrupted before VMRUN */ sti; hlt; /* Recieves an interrupt */ ... /* Thread is busy running interrupt handler */ VMRUN /* Fails */ ... /* Busy */ VMGEXIT /* VMEXIT_IDLE_REQUIRED */ if (exit_code == SVM_VMGEXIT_IDLE_REQUIRED) goto retry; Obviously we cannot just disable interrupts on sibling - if a high priority task wakes up on the SMT sibling, the core scheduling infrastructure will preempt the vCPU and run the high priority task on the sibling. This is where the "IDLE_WAKEUP_ICR" MSR (MSR_AMD64_HLT_WAKEUP_ICR) comes into play - when a CPU is idle and the SMT is running the vCPU of an SMT Protected guest, the idle CPU will not immediately exit idle when receiving an interrupt (or any "wake up event" as . It instead programs the value of the IDLE_WAKEUP_ICR into the local APIC register and waits. The expectation is that an interrupt will be sent to the sibling CPU which will cause a VMEXIT on the sibling and then the H/W will exit idle and start running the interrupt handler. This is the full flow with IDLE_WAKEUP_ICR programming: CPU0 (Running vCPU) CPU128 (SMT sibling) =================== ==================== /* VMRUN Path */ /* set_thread_flag(TIF_SVM_PROTECTED); * Core scheduling ensures this thread is retry: * force into an idle state. while (!(READ_ONCE(smt_ti->flags) & TIF_IDLING)) * XXX: Needs to only select HLT / C2 cpu_relax(); */ cpu_relax(); if (READ_ONCE(smt_ti->flags) & TIF_SVM_PROTECTED) cpu_relax(); force_hlt_or_c2() cpu_relax(); /* Program to send IPI to CPU0 */ cpu_relax(); wrmsrl(MSR_AMD64_HLT_WAKEUP_ICR, ...) cpu_relax(); set_thread_flag(TIF_IDLING); /* Sees TIF_IDLING on SMT */ native_safe_halt() ... sti; hlt; /* Idle */ VMRUN /* Success */ ... /* Idle */ ... /* Running protected guest. */ ... ... /* ... * Receives an interrupt. H/W writes ... * value in MSR_AMD64_HLT_WAKEUP_ICR ... * to the local APIC. ... */ /* Interrupted */ ... /* Still idle */ VMEXIT /* Exits idle, executes interrupt. */ /* Handle the dummy interrupt. */ goto retry; Apart form the "MSR_AMD64_HLT_WAKEUP_ICR" related bits, the coordination to force idle the sibling and waiting until HLT / C2 is executed has to be done by the OS / KVM. As Kim mentions, core scheduling can only ensure SMT starts running the idle task but the VMRUN for an SMT Protected guest requires the idle thread on the sibling to reach the idle instruction in order to proceed. Furthermore, every little noise on the sibling will cause the guest to continuously exit out which is a whole difference challenge to deal with and I'm assuming the folks will use isolated partitions to get around that. -- Thanks and Regards, Prateek