Re: [RFC PATCH 1/1] KVM: SEV: Add support for SMT Protection

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello Dave,

On 8/7/2025 11:30 PM, Dave Hansen wrote:
> On 8/7/25 09:59, Kim Phillips wrote:
>> Add the new CPUID bit that indicates available hardware support:
>> CPUID_Fn8000001F_EAX [AMD Secure Encryption EAX] bit 25.
>>
>> Indicate support for SEV_FEATURES bit 15 (SmtProtection) to be set by
>> an SNP guest to enable the feature.
> 
> It would be ideal to see an logical description of what "SmtProtection"
> is and what it means for the kernel as opposed to referring to the
> documentation and letting reviewers draw their own conclusions.

I'll try to elaborate on the general idea of SMT Protection for SEV-SNP
VM: The idea is when a vCPU is running (between VMRUN and VMEXIT), the
sibling CPU must be idle - in HLT or C2 state.

If the sibling is not idling in one of those state the VMRUN will
immediately exit with the "VMEXIT_IDLE_REQUIRED" error code.

Ideally, some layer in KVM / kernel has to coordinate the following:

  (I'm using thread_info flags for illustrative purposes)

                CPU0 (Running vCPU)                               CPU128 (SMT sibling)
                ===================                               ====================

  /* VMRUN Path */                                      /*
  set_thread_flag(TIF_SVM_PROTECTED);                    * Core scheduling ensures this thread is
  retry:                                                 * force into an idle state.
    while (!(READ_ONCE(smt_ti->flags) & TIF_IDLING))     * XXX: Needs to only select HLT / C2
      cpu_relax();                                       */
      cpu_relax();                                      if (READ_ONCE(smt_ti->flags) & TIF_SVM_PROTECTED)
      cpu_relax();                                        force_hlt_or_c2()
      cpu_relax();                                          set_thread_flag(TIF_IDLING);
      /* Sees TIF_IDLING on SMT */                          native_safe_halt(); 
      VMRUN /* Success */


Here is a case where the VMRUN fails with "VMEXIT_IDLE_REQUIRED":

                CPU0 (Running vCPU)                               CPU128 (SMT sibling)
                ===================                               ====================

  /* VMRUN Path */                                      /*
  set_thread_flag(TIF_SVM_PROTECTED);                    * Core scheduling ensures this thread is
  retry:                                                 * force into an idle state.
    while (!(READ_ONCE(smt_ti->flags) & TIF_IDLING))     * XXX: Needs to only select HLT / C2
      cpu_relax();                                       */
      cpu_relax();                                      if (READ_ONCE(smt_ti->flags) & TIF_SVM_PROTECTED)
      cpu_relax();                                        force_hlt_or_c2()
      cpu_relax();                                          set_thread_flag(TIF_IDLING);
      /* Sees TIF_IDLING on SMT */                          native_safe_halt()
      ... /* Interrupted before VMRUN */                      sti; hlt; /* Recieves an interrupt */
      ...                                                     /* Thread is busy running interrupt handler */
      VMRUN /* Fails */                                       ... /* Busy */
      VMGEXIT /* VMEXIT_IDLE_REQUIRED */
        if (exit_code == SVM_VMGEXIT_IDLE_REQUIRED)
          goto retry;


Obviously we cannot just disable interrupts on sibling - if a high
priority task wakes up on the SMT sibling, the core scheduling
infrastructure will preempt the vCPU and run the high priority task on
the sibling.

This is where the "IDLE_WAKEUP_ICR" MSR (MSR_AMD64_HLT_WAKEUP_ICR) comes
into play - when a CPU is idle and the SMT is running the vCPU of an SMT
Protected guest, the idle CPU will not immediately exit idle when
receiving an interrupt (or any "wake up event" as .

It instead programs the value of the IDLE_WAKEUP_ICR into the local APIC
register and waits. The expectation is that an interrupt will be sent to the
sibling CPU which will cause a VMEXIT on the sibling and then the H/W will
exit idle and start running the interrupt handler.

This is the full flow with IDLE_WAKEUP_ICR programming:

                CPU0 (Running vCPU)                               CPU128 (SMT sibling)
                ===================                               ====================

  /* VMRUN Path */                                      /*
  set_thread_flag(TIF_SVM_PROTECTED);                    * Core scheduling ensures this thread is
  retry:                                                 * force into an idle state.
    while (!(READ_ONCE(smt_ti->flags) & TIF_IDLING))     * XXX: Needs to only select HLT / C2
      cpu_relax();                                       */
      cpu_relax();                                      if (READ_ONCE(smt_ti->flags) & TIF_SVM_PROTECTED)
      cpu_relax();                                        force_hlt_or_c2()
      cpu_relax();                                          /* Program to send IPI to CPU0 */
      cpu_relax();                                          wrmsrl(MSR_AMD64_HLT_WAKEUP_ICR, ...)
      cpu_relax();                                          set_thread_flag(TIF_IDLING);
      /* Sees TIF_IDLING on SMT */                          native_safe_halt()
      ...                                                     sti; hlt; /* Idle */
      VMRUN /* Success */                                     ... /* Idle */
      ... /* Running protected guest. */                      ... 
      ...                                                     /*
      ...                                                      * Receives an interrupt. H/W writes
      ...                                                      * value in MSR_AMD64_HLT_WAKEUP_ICR
      ...                                                      * to the local APIC.
      ...                                                      */
      /* Interrupted */                                        ... /* Still idle */
      VMEXIT                                                   /* Exits idle, executes interrupt. */
      /* Handle the dummy interrupt. */
      goto retry;


Apart form the "MSR_AMD64_HLT_WAKEUP_ICR" related bits, the coordination
to force idle the sibling and waiting until HLT / C2 is executed has to
be done by the OS / KVM.

As Kim mentions, core scheduling can only ensure SMT starts running the
idle task but the VMRUN for an SMT Protected guest requires the idle
thread on the sibling to reach the idle instruction in order to proceed.

Furthermore, every little noise on the sibling will cause the guest to
continuously exit out which is a whole difference challenge to deal
with and I'm assuming the folks will use isolated partitions to get
around that.

-- 
Thanks and Regards,
Prateek





[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux