On Tue, Sep 09, 2025 at 01:03:43PM -0700, Sean Christopherson wrote: >On Tue, Sep 09, 2025, Chao Gao wrote: >> On Mon, Aug 25, 2025 at 10:55:20AM +0800, Chao Gao wrote: >> >On Sun, Aug 24, 2025 at 06:52:55PM -0700, Xin Li wrote: >> >>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c >> >>> index 6b01c6e9330e..799ac76679c9 100644 >> >>> --- a/arch/x86/kvm/x86.c >> >>> +++ b/arch/x86/kvm/x86.c >> >>> @@ -4566,6 +4569,21 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info) >> >>> } >> >>> EXPORT_SYMBOL_GPL(kvm_get_msr_common); >> >>> +/* >> >>> + * Returns true if the MSR in question is managed via XSTATE, i.e. is context >> >>> + * switched with the rest of guest FPU state. >> >>> + */ >> >>> +static bool is_xstate_managed_msr(u32 index) >> >>> +{ >> >>> + switch (index) { >> >>> + case MSR_IA32_U_CET: >> >> >> >> >> >>Why MSR_IA32_S_CET is not included here? > >Because the guest's S_CET must *never* be resident in harware while running in >the host. Doing so would create an egregious security issue due to letting the >guest disabled IBT and/or shadow stacks, or alternatively crash the host by >enabling one or the other. +1000 I completely missed this point. > >Having guest MSR_IA32_PL[0-3]_SSP resident in hardware while the _kernel_ is >running is safe, because those MSRs are only consumed on transitions to lower >privilege levels, i.e. from KVM's perspective, they're effectively user-return >MSRs that get restored on exit to userspace thanks to kvm_{load,put}_guest_fpu() >context switching between VMM and guest state (if the vCPU task is preempted, >the normal context switch code handles swapping state between tasks, it's only >the VMM vs. guest state that needs dedicated handling since they are the same >task). > >Context switching S_CET as part of XRSTORS very, VERY subtly works by virtue of >S_CET already being loaded with the host's value on VM-Exit. I.e. the value >saved into guest FPU state is always the host's value, and thus the value loaded >from guest FPU state is always the host's value. Looks like the host's value for a given vCPU should be constant here. I'm not sure if this will change in the future, but I think it's unlikely. > >And because all of that isn't enough, the final wrinkle is that KVM_{G,S}ET_XSAVE >only operate on xcr0 / user state, i.e. don't allow userspace to load supervisor >(S_CET) state into the kernel. Yes. userspace cannot see supervisor state in guest FPU and should read guest's S_CET/MSR_IA32_PL[0-3]SSP via KVM_GET_MSRS or KVM_GET_ONE_REG.