Re: [PATCH] KVM: SVM: Fix SNP AP destroy race with VMRUN

Tom Lendacky <thomas.lendacky@xxxxxxx> · Fri, 21 Mar 2025 11:20:03 -0500

On 3/18/25 08:47, Tom Lendacky wrote:
> On 3/18/25 07:43, Tom Lendacky wrote:
>> On 3/17/25 12:36, Tom Lendacky wrote:
>>> On 3/17/25 12:28, Sean Christopherson wrote:
>>>> On Mon, Mar 17, 2025, Tom Lendacky wrote:
>>>>> On 3/17/25 12:20, Tom Lendacky wrote:
>>>>>> An AP destroy request for a target vCPU is typically followed by an
>>>>>> RMPADJUST to remove the VMSA attribute from the page currently being
>>>>>> used as the VMSA for the target vCPU. This can result in a vCPU that
>>>>>> is about to VMRUN to exit with #VMEXIT_INVALID.
>>>>>>
>>>>>> This usually does not happen as APs are typically sitting in HLT when
>>>>>> being destroyed and therefore the vCPU thread is not running at the time.
>>>>>> However, if HLT is allowed inside the VM, then the vCPU could be about to
>>>>>> VMRUN when the VMSA attribute is removed from the VMSA page, resulting in
>>>>>> a #VMEXIT_INVALID when the vCPU actually issues the VMRUN and causing the
>>>>>> guest to crash. An RMPADJUST against an in-use (already running) VMSA
>>>>>> results in a #NPF for the vCPU issuing the RMPADJUST, so the VMSA
>>>>>> attribute cannot be changed until the VMRUN for target vCPU exits. The
>>>>>> Qemu command line option '-overcommit cpu-pm=on' is an example of allowing
>>>>>> HLT inside the guest.
>>>>>>
>>>>>> Use kvm_test_request() to ensure that the target vCPU sees the AP destroy
>>>>>> request before returning to the initiating vCPU.
>>>>>>
>>>>>> Fixes: e366f92ea99e ("KVM: SEV: Support SEV-SNP AP Creation NAE event")
>>>>>> Signed-off-by: Tom Lendacky <thomas.lendacky@xxxxxxx>

>>>>
>>>> Very off-the-cuff, but I assume KVM_REQ_UPDATE_PROTECTED_GUEST_STATE just needs
>>>> to be annotated with KVM_REQUEST_WAIT.
>>>
>>> Ok, nice. I wasn't sure if KVM_REQUEST_WAIT would be appropriate here.
>>> This is much simpler. Let me test it out and resend if everything goes ok.
>>
>> So that doesn't work. I can still get an occasional #VMEXIT_INVALID. Let
>> me try to track down what is happening with this approach...
> 
> Looks like I need to use kvm_make_vcpus_request_mask() instead of just a
> plain kvm_make_request() followed by a kvm_vcpu_kick().
> 
> Let me try that and see how this works.

This appears to be working ok. The kvm_make_vcpus_request_mask() function
would need to be EXPORT_SYMBOL_GPL, though, any objections to that?

I could also simplify this a bit by creating a new function that takes a
target vCPU and then calls kvm_make_vcpus_request_mask() from there.
Thoughts?

This is what the patch currently looks like:

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 32ae3aa50c7e..51aa63591b0a 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -123,7 +123,8 @@
 	KVM_ARCH_REQ_FLAGS(31, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
 #define KVM_REQ_HV_TLB_FLUSH \
 	KVM_ARCH_REQ_FLAGS(32, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
-#define KVM_REQ_UPDATE_PROTECTED_GUEST_STATE	KVM_ARCH_REQ(34)
+#define KVM_REQ_UPDATE_PROTECTED_GUEST_STATE \
+	KVM_ARCH_REQ_FLAGS(34, KVM_REQUEST_WAIT)
 
 #define CR0_RESERVED_BITS                                               \
 	(~(unsigned long)(X86_CR0_PE | X86_CR0_MP | X86_CR0_EM | X86_CR0_TS \
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 6e3f5042d9ce..0c45cc0c0571 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -4038,8 +4038,13 @@ static int sev_snp_ap_creation(struct vcpu_svm *svm)
 
 out:
 	if (kick) {
-		kvm_make_request(KVM_REQ_UPDATE_PROTECTED_GUEST_STATE, target_vcpu);
-		kvm_vcpu_kick(target_vcpu);
+		DECLARE_BITMAP(vcpu_bitmap, KVM_MAX_VCPUS);
+
+		bitmap_zero(vcpu_bitmap, KVM_MAX_VCPUS);
+		bitmap_set(vcpu_bitmap, target_vcpu->vcpu_idx, 1);
+		kvm_make_vcpus_request_mask(vcpu->kvm,
+					    KVM_REQ_UPDATE_PROTECTED_GUEST_STATE,
+					    vcpu_bitmap);
 	}
 
 	mutex_unlock(&target_svm->sev_es.snp_vmsa_mutex);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index ba0327e2d0d3..08c135f3d31f 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -268,6 +268,7 @@ bool kvm_make_vcpus_request_mask(struct kvm *kvm, unsigned int req,
 
 	return called;
 }
+EXPORT_SYMBOL_GPL(kvm_make_vcpus_request_mask);
 
 bool kvm_make_all_cpus_request(struct kvm *kvm, unsigned int req)
 {


Thanks,
Tom

> 
> Thanks,
> Tom
> 
>>
>> Thanks,
>> Tom
>>
>>>
>>> Thanks,
>>> Tom
>>>
>>>>
>>>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>>>> index 04e6c5604bc3..67abfe97c600 100644
>>>> --- a/arch/x86/include/asm/kvm_host.h
>>>> +++ b/arch/x86/include/asm/kvm_host.h
>>>> @@ -124,7 +124,8 @@
>>>>         KVM_ARCH_REQ_FLAGS(31, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
>>>>  #define KVM_REQ_HV_TLB_FLUSH \
>>>>         KVM_ARCH_REQ_FLAGS(32, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
>>>> -#define KVM_REQ_UPDATE_PROTECTED_GUEST_STATE   KVM_ARCH_REQ(34)
>>>> +#define KVM_REQ_UPDATE_PROTECTED_GUEST_STATE \
>>>> +       KVM_ARCH_REQ_FLAGS(34, KVM_REQUEST_WAIT)
>>>>  
>>>>  #define CR0_RESERVED_BITS                                               \
>>>>         (~(unsigned long)(X86_CR0_PE | X86_CR0_MP | X86_CR0_EM | X86_CR0_TS \
>>>>
>>>>