On 4/9/2025 4:13 AM, Sean Christopherson wrote: > On Tue, Apr 01, 2025, Maxim Levitsky wrote: >> On Thu, 2025-02-27 at 14:24 -0800, Sean Christopherson wrote: >>> Fix a long-lurking bug in SVM where KVM runs the guest with the host's >>> DEBUGCTL if LBR virtualization is disabled. AMD CPUs rather stupidly >>> context switch DEBUGCTL if and only if LBR virtualization is enabled (not >>> just supported, but fully enabled). >>> >>> The bug has gone unnoticed because until recently, the only bits that >>> KVM would leave set were things like BTF, which are guest visible but >>> won't cause functional problems unless guest software is being especially >>> particular about #DBs. >>> >>> The bug was exposed by the addition of BusLockTrap ("Detect" in the kernel), >>> as the resulting #DBs due to split-lock accesses in guest userspace (lol >>> Steam) get reflected into the guest by KVM. >>> >>> Note, I don't love suppressing DEBUGCTL.BTF, but practically speaking that's >>> likely the behavior that SVM guests have gotten the vast, vast majority of >>> the time, and given that it's the behavior on Intel, it's (hopefully) a safe >>> option for a fix, e.g. versus trying to add proper BTF virtualization on the >>> fly. >>> >>> v3: >>> - Suppress BTF, as KVM doesn't actually support it. [Ravi] >>> - Actually load the guest's DEBUGCTL (though amusingly, with BTF squashed, >>> it's guaranteed to be '0' in this scenario). [Ravi] >>> >>> v2: >>> - Load the guest's DEBUGCTL instead of simply zeroing it on VMRUN. >>> - Drop bits 5:3 from guest DEBUGCTL so that KVM doesn't let the guest >>> unintentionally enable BusLockTrap (AMD repurposed bits). [Ravi] >>> - Collect a review. [Xiaoyao] >>> - Make bits 5:3 fully reserved, in a separate not-for-stable patch. >>> >>> v1: https://lore.kernel.org/all/20250224181315.2376869-1-seanjc@xxxxxxxxxx >>> >> >> >> Hi, >> >> Amusingly there is another DEBUGCTL issue, which I just got to the bottom of. >> (if I am not mistaken of course). >> >> We currently don't let the guest set DEBUGCTL.FREEZE_WHILE_SMM and neither >> set it ourselves in GUEST_IA32_DEBUGCTL vmcs field, even when supported by the host >> (If I read the code correctly, I didn't verify this in runtime) > > Ugh, SMM. Yeah, KVM doesn't propagate DEBUGCTLMSR_FREEZE_IN_SMM to the guest > value. KVM intercepts reads and writes to DEBUGCTL, so it should be easy enough > to shove the bit in on writes, and drop it on reads. > >> This means that the host #SMIs will interfere with the guest PMU. In >> particular this causes the 'pmu' kvm-unit-test to fail, which is something >> that our CI caught. >> >> I think that kvm should just set this bit, or even better, use the host value >> of this bit, and hide it from the guest, because the guest shouldn't know >> about host's smm, and we AFAIK don't really support freezing perfmon when the >> guest enters its own emulated SMM. > > Agreed. Easy thing is to use the host's value, so that KVM doesn't need to check > for its existence. I can't think of anything that would go sideways by freezing > perfmon if the host happens to take an SMI. > >> What do you think? I'll post patches if you think that this is a good idea. >> (A temp hack to set this bit always in GUEST_IA32_DEBUGCTL fixed the problem for me) >> >> I also need to check if AMD also has this feature, or if this is Intel specific. > > Intel only. I assume/think/hope AMD's Host/Guest Only field in the event selector > effectively hides SMM from the guest. Just using the GuestOnly bit does not hide SMM activity from guests. SMIs are generally intercepted (kvm_amd.intercept_smi defaults to true) and handled in the host context. So guest PMCs are isolated by a combination of having the GuestOnly bit set and the #VMEXITs resulting from SMI interception.