On Tue, 2025-04-22 at 16:33 -0700, Sean Christopherson wrote: > On Tue, Apr 15, 2025, Maxim Levitsky wrote: > > Instead of reading and writing GUEST_IA32_DEBUGCTL vmcs field directly, > > wrap the logic with get/set functions. > > Why? I know why the "set" helper is being added, but it needs to called out. > > Please omit the getter entirely, it does nothing more than obfuscate a very > simple line of code. In this patch yes. But in the next patch I switch to reading from 'vmx->msr_ia32_debugctl' You want me to open code this access? I don't mind, if you insist. > > > Also move the checks that the guest's supplied value is valid to the new > > 'set' function. > > Please do this in a separate patch. There's no need to mix refactoring and > functional changes. I thought that it was natural to do this in a the same patch. In this patch I introduce a 'vmx_set_guest_debugctl' which should be used any time we set the msr given the guest value, and VM entry is one of these cases. I can split this if you want. > > > In particular, the above change fixes a minor security issue in which L1 > > Bug, yes. Not sure it constitutes a meaningful security issue though. I also think so, but I wanted to mention this just in case. > > > hypervisor could set the GUEST_IA32_DEBUGCTL, and eventually the host's > > MSR_IA32_DEBUGCTL > > No, the lack of a consistency check allows the guest to set the MSR in hardware, > but that is not the host's value. That's what I meant - the guest can set the real hardware MSR. Yes, after the guest exits, the OS value is restored. I'll rephrase this in v2. > > > to any value by performing a VM entry to L2 with VM_ENTRY_LOAD_DEBUG_CONTROLS > > set. > > Any *legal* value. Setting completely unsupported bits will result in VM-Enter > failing with a consistency check VM-Exit. True. > > > Signed-off-by: Maxim Levitsky <mlevitsk@xxxxxxxxxx> > > --- > > arch/x86/kvm/vmx/nested.c | 15 +++++++--- > > arch/x86/kvm/vmx/pmu_intel.c | 9 +++--- > > arch/x86/kvm/vmx/vmx.c | 58 +++++++++++++++++++++++------------- > > arch/x86/kvm/vmx/vmx.h | 3 ++ > > 4 files changed, 57 insertions(+), 28 deletions(-) > > > > diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c > > index e073e3008b16..b7686569ee09 100644 > > --- a/arch/x86/kvm/vmx/nested.c > > +++ b/arch/x86/kvm/vmx/nested.c > > @@ -2641,6 +2641,7 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12, > > struct vcpu_vmx *vmx = to_vmx(vcpu); > > struct hv_enlightened_vmcs *evmcs = nested_vmx_evmcs(vmx); > > bool load_guest_pdptrs_vmcs12 = false; > > + u64 new_debugctl; > > > > if (vmx->nested.dirty_vmcs12 || nested_vmx_is_evmptr12_valid(vmx)) { > > prepare_vmcs02_rare(vmx, vmcs12); > > @@ -2653,11 +2654,17 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12, > > if (vmx->nested.nested_run_pending && > > (vmcs12->vm_entry_controls & VM_ENTRY_LOAD_DEBUG_CONTROLS)) { > > kvm_set_dr(vcpu, 7, vmcs12->guest_dr7); > > - vmcs_write64(GUEST_IA32_DEBUGCTL, vmcs12->guest_ia32_debugctl); > > + new_debugctl = vmcs12->guest_ia32_debugctl; > > } else { > > kvm_set_dr(vcpu, 7, vcpu->arch.dr7); > > - vmcs_write64(GUEST_IA32_DEBUGCTL, vmx->nested.pre_vmenter_debugctl); > > + new_debugctl = vmx->nested.pre_vmenter_debugctl; > > } > > + > > + if (CC(!vmx_set_guest_debugctl(vcpu, new_debugctl, false))) { > > The consistency check belongs in nested_vmx_check_guest_state(), only needs to > check the VM_ENTRY_LOAD_DEBUG_CONTROLS case, and should be posted as a separate > patch. I can move it there. Can you explain why though you want this? Is it because of the order of checks specified in the PRM? Currently GUEST_IA32_DEBUGCTL of the host is *written* in prepare_vmcs02. Should I also move this write to nested_vmx_check_guest_state? Or should I write the value blindly in prepare_vmcs02 and then check the value of 'vmx->msr_ia32_debugctl' in nested_vmx_check_guest_state and fail if the value contains reserved bits? I don't like that idea that much IMHO. > > > + *entry_failure_code = ENTRY_FAIL_DEFAULT; > > + return -EINVAL; > > + } > > + > > +static void __vmx_set_guest_debugctl(struct kvm_vcpu *vcpu, u64 data) > > +{ > > + vmcs_write64(GUEST_IA32_DEBUGCTL, data); > > +} > > + > > +bool vmx_set_guest_debugctl(struct kvm_vcpu *vcpu, u64 data, bool host_initiated) > > +{ > > + u64 invalid = data & ~vmx_get_supported_debugctl(vcpu, host_initiated); > > + > > + if (invalid & (DEBUGCTLMSR_BTF|DEBUGCTLMSR_LBR)) { > > + kvm_pr_unimpl_wrmsr(vcpu, MSR_IA32_DEBUGCTLMSR, data); > > + data &= ~(DEBUGCTLMSR_BTF|DEBUGCTLMSR_LBR); > > + invalid &= ~(DEBUGCTLMSR_BTF|DEBUGCTLMSR_LBR); > > + } > > + > > + if (invalid) > > + return false; > > + > > + if (is_guest_mode(vcpu) && (get_vmcs12(vcpu)->vm_exit_controls & > > + VM_EXIT_SAVE_DEBUG_CONTROLS)) > > + get_vmcs12(vcpu)->guest_ia32_debugctl = data; > > + > > + if (intel_pmu_lbr_is_enabled(vcpu) && !to_vmx(vcpu)->lbr_desc.event && > > + (data & DEBUGCTLMSR_LBR)) > > + intel_pmu_create_guest_lbr_event(vcpu); > > + > > + __vmx_set_guest_debugctl(vcpu, data); > > + return true; > > Return 0/-errno, not true/false. There are plenty of functions in this file and KVM that return boolean. e.g: static bool nested_vmx_check_eptp(struct kvm_vcpu *vcpu, u64 new_eptp) static inline bool vmx_control_verify(u32 control, u32 low, u32 high) static bool nested_evmcs_handle_vmclear(struct kvm_vcpu *vcpu, gpa_t vmptr) static inline bool nested_vmx_prepare_msr_bitmap(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) static bool nested_vmx_check_eptp(struct kvm_vcpu *vcpu, u64 new_eptp) static bool nested_get_vmcs12_pages(struct kvm_vcpu *vcpu) ... I personally think that functions that emulate hardware should return boolean values or some hardware specific status code (e.g VMX failure code) because the real hardware never returns -EINVAL and such. Best regards, Maxim Levitsky >