Re: [PATCH v13 05/21] KVM: x86: Load guest FPU state when access XSAVE-managed MSRs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Sep 09, 2025, Chao Gao wrote:
> On Mon, Aug 25, 2025 at 10:55:20AM +0800, Chao Gao wrote:
> >On Sun, Aug 24, 2025 at 06:52:55PM -0700, Xin Li wrote:
> >>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> >>> index 6b01c6e9330e..799ac76679c9 100644
> >>> --- a/arch/x86/kvm/x86.c
> >>> +++ b/arch/x86/kvm/x86.c
> >>> @@ -4566,6 +4569,21 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> >>>   }
> >>>   EXPORT_SYMBOL_GPL(kvm_get_msr_common);
> >>> +/*
> >>> + *  Returns true if the MSR in question is managed via XSTATE, i.e. is context
> >>> + *  switched with the rest of guest FPU state.
> >>> + */
> >>> +static bool is_xstate_managed_msr(u32 index)
> >>> +{
> >>> +	switch (index) {
> >>> +	case MSR_IA32_U_CET:
> >>
> >>
> >>Why MSR_IA32_S_CET is not included here?

Because the guest's S_CET must *never* be resident in harware while running in
the host.  Doing so would create an egregious security issue due to letting the
guest disabled IBT and/or shadow stacks, or alternatively crash the host by
enabling one or the other.

Having guest MSR_IA32_PL[0-3]_SSP resident in hardware while the _kernel_ is
running is safe, because those MSRs are only consumed on transitions to lower
privilege levels, i.e. from KVM's perspective, they're effectively user-return
MSRs that get restored on exit to userspace thanks to kvm_{load,put}_guest_fpu()
context switching between VMM and guest state (if the vCPU task is preempted,
the normal context switch code handles swapping state between tasks, it's only
the VMM vs. guest state that needs dedicated handling since they are the same
task).

Context switching S_CET as part of XRSTORS very, VERY subtly works by virtue of
S_CET already being loaded with the host's value on VM-Exit.  I.e. the value
saved into guest FPU state is always the host's value, and thus the value loaded
from guest FPU state is always the host's value.

And because all of that isn't enough, the final wrinkle is that KVM_{G,S}ET_XSAVE
only operate on xcr0 / user state, i.e. don't allow userspace to load supervisor
(S_CET) state into the kernel.

> >Emm. I didn't think about this
> >
> >MSR_IA32_S_CET is read from or written to a dedicated VMCS/B field, so KVM
> >doesn't need to load the guest FPU to access MSR_IA32_S_CET. This pairs with
> >the kvm_{get,set}_xstate_msr() in kvm_{get,set}_msr_common().
> >
> >That said, userspace writes can indeed cause an inconsistency between the guest
> >FPU and VMCS fields regarding MSR_IA32_S_CET. If migration occurs right after a
> >userspace write (without a VM-entry, which would bring them in sync) and
> >userspace just restores MSR_IA32_S_CET from the guest FPU, the write before
> >migration could be lost.
> >
> >If that migration issue is a practical problem, I think MSR_IA32_S_CET should
> >be included here, and we need to perform a kvm_set_xstate_msr() after writing
> >to the VMCS/B.
> 
> I prefer to make guest FPU and VMCS always consistent regarding MSR_IA32_S_CET.
> The cost is to load guest FPU before userspace accesses the MSR and save guest
> FPU after that. It isn't a problem as MSR_IA32_S_CET accesses from userspace
> shouldn't be on any hot-path. So, I will add:
> 
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 989008f5307e..92daf63c9487 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -2435,6 +2435,7 @@ int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> 		break;
> 	case MSR_IA32_S_CET:
> 		vmcs_writel(GUEST_S_CET, data);
> +		kvm_set_xstate_msr(vcpu, msr_info);

As explained above, this is a fatal flaw, as it will allow userspace to write an
arbitrary value to S_CET.

> 		break;
> 	case MSR_KVM_INTERNAL_GUEST_SSP:
> 		vmcs_writel(GUEST_SSP, data);
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index a3770e3d5154..6f64a3355274 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -4647,6 +4647,7 @@ EXPORT_SYMBOL_GPL(kvm_get_msr_common);
>  static bool is_xstate_managed_msr(u32 index)
>  {
> 	switch (index) {
> +	case MSR_IA32_S_CET:

And this is wrong too.  I'll add a big comment to explain all of this, especially
since I just spent an entire morning re-learning how all of this works.  *sigh*

> 	case MSR_IA32_U_CET:
> 	case MSR_IA32_PL0_SSP ... MSR_IA32_PL3_SSP:




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux