On Thu, May 29, 2025 at 04:39:53PM -0700, Sean Christopherson wrote: >Use a dedicated array of MSRPM offsets to merge L0 and L1 bitmaps, i.e. to >merge KVM's vmcb01 bitmap with L1's vmcb12 bitmap. This will eventually >allow for the removal of direct_access_msrs, as the only path where >tracking the offsets is truly justified is the merge for nested SVM, where >merging in chunks is an easy way to batch uaccess reads/writes. > >Opportunistically omit the x2APIC MSRs from the merge-specific array >instead of filtering them out at runtime. > >Note, disabling interception of XSS, EFER, PAT, GHCB, and TSC_AUX is >mutually exclusive with nested virtualization, as KVM passes through the >MSRs only for SEV-ES guests, and KVM doesn't support nested virtualization >for SEV+ guests. Defer removing those MSRs to a future cleanup in order >to make this refactoring as benign as possible. > >Signed-off-by: Sean Christopherson <seanjc@xxxxxxxxxx> >--- > arch/x86/kvm/svm/nested.c | 72 +++++++++++++++++++++++++++++++++------ > arch/x86/kvm/svm/svm.c | 4 +++ > arch/x86/kvm/svm/svm.h | 2 ++ > 3 files changed, 67 insertions(+), 11 deletions(-) > >diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c >index 89a77f0f1cc8..e53020939e60 100644 >--- a/arch/x86/kvm/svm/nested.c >+++ b/arch/x86/kvm/svm/nested.c >@@ -184,6 +184,64 @@ void recalc_intercepts(struct vcpu_svm *svm) > } > } > >+static int nested_svm_msrpm_merge_offsets[9] __ro_after_init; I understand how the array size (i.e., 9) was determined :). But, adding a comment explaining this would be quite helpful >+static int nested_svm_nr_msrpm_merge_offsets __ro_after_init; >+ >+int __init nested_svm_init_msrpm_merge_offsets(void) >+{ >+ const u32 merge_msrs[] = { >+ MSR_STAR, >+ MSR_IA32_SYSENTER_CS, >+ MSR_IA32_SYSENTER_EIP, >+ MSR_IA32_SYSENTER_ESP, >+ #ifdef CONFIG_X86_64 >+ MSR_GS_BASE, >+ MSR_FS_BASE, >+ MSR_KERNEL_GS_BASE, >+ MSR_LSTAR, >+ MSR_CSTAR, >+ MSR_SYSCALL_MASK, >+ #endif >+ MSR_IA32_SPEC_CTRL, >+ MSR_IA32_PRED_CMD, >+ MSR_IA32_FLUSH_CMD, MSR_IA32_DEBUGCTLMSR is missing, but it's benign since it shares the same offset as MSR_IA32_LAST* below. I'm a bit concerned that we might overlook adding new MSRs to this array in the future, which could lead to tricky bugs. But I have no idea how to avoid this. Removing this array and iterating over direct_access_msrs[] directly is an option but it contradicts this series as one of its purposes is to remove direct_access_msrs[]. >+ MSR_IA32_LASTBRANCHFROMIP, >+ MSR_IA32_LASTBRANCHTOIP, >+ MSR_IA32_LASTINTFROMIP, >+ MSR_IA32_LASTINTTOIP, >+ >+ MSR_IA32_XSS, >+ MSR_EFER, >+ MSR_IA32_CR_PAT, >+ MSR_AMD64_SEV_ES_GHCB, >+ MSR_TSC_AUX, >+ }; > > if (kvm_vcpu_read_guest(vcpu, offset, &value, 4)) >diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c >index 1c70293400bc..84dd1f220986 100644 >--- a/arch/x86/kvm/svm/svm.c >+++ b/arch/x86/kvm/svm/svm.c >@@ -5689,6 +5689,10 @@ static int __init svm_init(void) > if (!kvm_is_svm_supported()) > return -EOPNOTSUPP; > >+ r = nested_svm_init_msrpm_merge_offsets(); >+ if (r) >+ return r; >+ If the offset array is used for nested virtualization only, how about guarding this with nested virtualization? For example, in svm_hardware_setup(): if (nested) { r = nested_svm_init_msrpm_merge_offsets(); if (r) goto err; pr_info("Nested Virtualization enabled\n"); kvm_enable_efer_bits(EFER_SVME | EFER_LMSLE); } > r = kvm_x86_vendor_init(&svm_init_ops); > if (r) > return r; >diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h >index 909b9af6b3c1..0a8041d70994 100644 >--- a/arch/x86/kvm/svm/svm.h >+++ b/arch/x86/kvm/svm/svm.h >@@ -686,6 +686,8 @@ static inline bool nested_exit_on_nmi(struct vcpu_svm *svm) > return vmcb12_is_intercept(&svm->nested.ctl, INTERCEPT_NMI); > } > >+int __init nested_svm_init_msrpm_merge_offsets(void); >+ > int enter_svm_guest_mode(struct kvm_vcpu *vcpu, > u64 vmcb_gpa, struct vmcb *vmcb12, bool from_vmrun); > void svm_leave_nested(struct kvm_vcpu *vcpu); >-- >2.49.0.1204.g71687c7c1d-goog >