+Jim On Thu, May 22, 2025, Sean Christopherson wrote: > On Wed, May 21, 2025, Maxim Levitsky wrote: > > Check the vmcs12 guest_ia32_debugctl value before loading it, to avoid L2 > > being able to load arbitrary values to hardware IA32_DEBUGCTL. > > > > Reviewed-by: Chao Gao <chao.gao@xxxxxxxxx> > > Signed-off-by: Maxim Levitsky <mlevitsk@xxxxxxxxxx> > > --- > > arch/x86/kvm/vmx/nested.c | 3 ++- > > arch/x86/kvm/vmx/vmx.c | 2 +- > > arch/x86/kvm/vmx/vmx.h | 1 + > > 3 files changed, 4 insertions(+), 2 deletions(-) > > > > diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c > > index e073e3008b16..00f2b762710c 100644 > > --- a/arch/x86/kvm/vmx/nested.c > > +++ b/arch/x86/kvm/vmx/nested.c > > @@ -3146,7 +3146,8 @@ static int nested_vmx_check_guest_state(struct kvm_vcpu *vcpu, > > return -EINVAL; > > > > if ((vmcs12->vm_entry_controls & VM_ENTRY_LOAD_DEBUG_CONTROLS) && > > - CC(!kvm_dr7_valid(vmcs12->guest_dr7))) > > + (CC(!kvm_dr7_valid(vmcs12->guest_dr7)) || > > + CC(vmcs12->guest_ia32_debugctl & ~vmx_get_supported_debugctl(vcpu, false)))) > > This is a breaking change. For better or worse (read: worse), KVM's ABI is to > drop BTF and LBR if they're unsupported (the former is always unsupported). > Failure to honor that ABI means L1 can't excplitly load what it think is its > current value into L2. > > I'll slot in a path to provide another helper for checking the validity of > DEBUGCTL. I think I've managed to cobble together something that isn't too > horrific (options are a bit limited due to the existing ugliness). And then Jim ruined my day. :-) As evidenced by this hilarious KUT testcase, it's entirely possible there are existing KVM guests that are utilizing/abusing DEBUGCTL features. /* * Optional RTM test for hardware that supports RTM, to * demonstrate that the current volume 3 of the SDM * (325384-067US), table 27-1 is incorrect. Bit 16 of the exit * qualification for debug exceptions is not reserved. It is * set to 1 if a debug exception (#DB) or a breakpoint * exception (#BP) occurs inside an RTM region while advanced * debugging of RTM transactional regions is enabled. */ if (this_cpu_has(X86_FEATURE_RTM)) { vmcs_write(ENT_CONTROLS, vmcs_read(ENT_CONTROLS) | ENT_LOAD_DBGCTLS); /* * Set DR7.RTM[bit 11] and IA32_DEBUGCTL.RTM[bit 15] * in the guest to enable advanced debugging of RTM * transactional regions. */ vmcs_write(GUEST_DR7, BIT(11)); vmcs_write(GUEST_DEBUGCTL, BIT(15)); single_step_guest("Hardware delivered single-step in " "transactional region", starting_dr6, 0); check_db_exit(false, false, false, &xbegin, BIT(16), starting_dr6); } else { vmcs_write(GUEST_RIP, (u64)&skip_rtm); enter_guest(); } For RTM specifically, disallowing DEBUGCTL.RTM but allowing DR7.RTM seems a bit silly. Unless there's a security concern, that can probably be fixed by adding support for RTM. Note, there's also a virtualization hole here, as KVM doesn't vet DR7 beyond checking that bits 63:32 are zero, i.e. a guest could set DR7.RTM even if KVM doesn't advertise support. Of course, closing that hole would require completely dropping support for disabling DR interception, since VMX doesn't give per-DR controls. For the other bits, I don't see a good solution. The only viable options I see are to silently drop all unsupported bits (maybe with a quirk?), or enforce all bits and cross our fingers that no L1 VMM is running guests with those bits set in GUEST_DEBUGCTL.