On Sat, Jul 26, 2025 at 12:07 AM Sean Christopherson <seanjc@xxxxxxxxxx> wrote: > > The highlights are the DEBUGCTL.FREEZE_IN_SMM fix from Maxim, Jim's APERF/MPERF > support that has probably made him question the meaning of life, and a big > cleanup of the MSR interception code to ease the pain of adding support for > CET, FRED, and the mediated PMU (and any other features that deal with MSRs). > > But the one change that I really want your eyeballs on is that last commit, > "Reject KVM_SET_TSC_KHZ VM ioctl when vCPUs have been created"; it's an ABI > change that could break userspace. AFAICT, it won't affect any (known) > userspace, and restricting the ioctl for all VM types is much simpler than > special casing "secure" TSC guests. Holler if you want a new tag/pull request > without that change; I deliberately kept it dead last specifically so it could > be omitted without any fuss. No problem there. It makes no sense to use the VM ioctl if you can't issue it before vCPU creation, the whole point is to have a homogenous frequency. Paolo > The following changes since commit 28224ef02b56fceee2c161fe2a49a0bb197e44f5: > > KVM: TDX: Report supported optional TDVMCALLs in TDX capabilities (2025-06-20 14:20:20 -0400) > > are available in the Git repository at: > > https://github.com/kvm-x86/linux.git tags/kvm-x86-misc-6.17 > > for you to fetch changes up to dcbe5a466c123a475bb66492749549f09b5cab00: > > KVM: x86: Reject KVM_SET_TSC_KHZ VM ioctl when vCPUs have been created (2025-07-14 15:29:33 -0700) > > ---------------------------------------------------------------- > KVM x86 misc changes for 6.17 > > - Prevert the host's DEBUGCTL.FREEZE_IN_SMM (Intel only) when running the > guest. Failure to honor FREEZE_IN_SMM can bleed host state into the guest. > > - Explicitly check vmcs12.GUEST_DEBUGCTL on nested VM-Enter (Intel only) to > prevent L1 from running L2 with features that KVM doesn't support, e.g. BTF. > > - Intercept SPEC_CTRL on AMD if the MSR shouldn't exist according to the > vCPU's CPUID model. > > - Rework the MSR interception code so that the SVM and VMX APIs are more or > less identical. > > - Recalculate all MSR intercepts from the "source" on MSR filter changes, and > drop the dedicated "shadow" bitmaps (and their awful "max" size defines). > > - WARN and reject loading kvm-amd.ko instead of panicking the kernel if the > nested SVM MSRPM offsets tracker can't handle an MSR. > > - Advertise support for LKGS (Load Kernel GS base), a new instruction that's > loosely related to FRED, but is supported and enumerated independently. > > - Fix a user-triggerable WARN that syzkaller found by stuffing INIT_RECEIVED, > a.k.a. WFS, and then putting the vCPU into VMX Root Mode (post-VMXON). Use > the same approach KVM uses for dealing with "impossible" emulation when > running a !URG guest, and simply wait until KVM_RUN to detect that the vCPU > has architecturally impossible state. > > - Add KVM_X86_DISABLE_EXITS_APERFMPERF to allow disabling interception of > APERF/MPERF reads, so that a "properly" configured VM can "virtualize" > APERF/MPERF (with many caveats). > > - Reject KVM_SET_TSC_KHZ if vCPUs have been created, as changing the "default" > frequency is unsupported for VMs with a "secure" TSC, and there's no known > use case for changing the default frequency for other VM types. > > ---------------------------------------------------------------- > Chao Gao (2): > KVM: x86: Deduplicate MSR interception enabling and disabling > KVM: SVM: Simplify MSR interception logic for IA32_XSS MSR > > Jim Mattson (3): > KVM: x86: Replace growing set of *_in_guest bools with a u64 > KVM: x86: Provide a capability to disable APERF/MPERF read intercepts > KVM: selftests: Test behavior of KVM_X86_DISABLE_EXITS_APERFMPERF > > Kai Huang (1): > KVM: x86: Reject KVM_SET_TSC_KHZ VM ioctl when vCPUs have been created > > Maxim Levitsky (3): > KVM: nVMX: Check vmcs12->guest_ia32_debugctl on nested VM-Enter > KVM: VMX: Wrap all accesses to IA32_DEBUGCTL with getter/setter APIs > KVM: VMX: Preserve host's DEBUGCTLMSR_FREEZE_IN_SMM while running the guest > > Sean Christopherson (44): > KVM: TDX: Use kvm_arch_vcpu.host_debugctl to restore the host's DEBUGCTL > KVM: x86: Convert vcpu_run()'s immediate exit param into a generic bitmap > KVM: x86: Drop kvm_x86_ops.set_dr6() in favor of a new KVM_RUN flag > KVM: VMX: Allow guest to set DEBUGCTL.RTM_DEBUG if RTM is supported > KVM: VMX: Extract checking of guest's DEBUGCTL into helper > KVM: SVM: Disable interception of SPEC_CTRL iff the MSR exists for the guest > KVM: SVM: Allocate IOPM pages after initial setup in svm_hardware_setup() > KVM: SVM: Don't BUG if setting up the MSR intercept bitmaps fails > KVM: SVM: Tag MSR bitmap initialization helpers with __init > KVM: SVM: Use ARRAY_SIZE() to iterate over direct_access_msrs > KVM: SVM: Kill the VM instead of the host if MSR interception is buggy > KVM: x86: Use non-atomic bit ops to manipulate "shadow" MSR intercepts > KVM: SVM: Massage name and param of helper that merges vmcb01 and vmcb12 MSRPMs > KVM: SVM: Clean up macros related to architectural MSRPM definitions > KVM: nSVM: Use dedicated array of MSRPM offsets to merge L0 and L1 bitmaps > KVM: nSVM: Omit SEV-ES specific passthrough MSRs from L0+L1 bitmap merge > KVM: nSVM: Don't initialize vmcb02 MSRPM with vmcb01's "always passthrough" > KVM: SVM: Add helpers for accessing MSR bitmap that don't rely on offsets > KVM: SVM: Implement and adopt VMX style MSR intercepts APIs > KVM: SVM: Pass through GHCB MSR if and only if VM is an SEV-ES guest > KVM: SVM: Drop "always" flag from list of possible passthrough MSRs > KVM: x86: Move definition of X2APIC_MSR() to lapic.h > KVM: VMX: Manually recalc all MSR intercepts on userspace MSR filter change > KVM: SVM: Manually recalc all MSR intercepts on userspace MSR filter change > KVM: x86: Rename msr_filter_changed() => recalc_msr_intercepts() > KVM: SVM: Rename init_vmcb_after_set_cpuid() to make it intercepts specific > KVM: SVM: Fold svm_vcpu_init_msrpm() into its sole caller > KVM: SVM: Merge "after set CPUID" intercept recalc helpers > KVM: SVM: Drop explicit check on MSRPM offset when emulating SEV-ES accesses > KVM: SVM: Move svm_msrpm_offset() to nested.c > KVM: SVM: Store MSRPM pointer as "void *" instead of "u32 *" > KVM: nSVM: Access MSRPM in 4-byte chunks only for merging L0 and L1 bitmaps > KVM: SVM: Return -EINVAL instead of MSR_INVALID to signal out-of-range MSR > KVM: nSVM: Merge MSRPM in 64-bit chunks on 64-bit kernels > KVM: SVM: Add a helper to allocate and initialize permissions bitmaps > KVM: x86: Simplify userspace filter logic when disabling MSR interception > KVM: selftests: Verify KVM disable interception (for userspace) on filter change > KVM: x86: Drop pending_smi vs. INIT_RECEIVED check when setting MP_STATE > KVM: x86: WARN and reject KVM_RUN if vCPU's MP_STATE is SIPI_RECEIVED > KVM: x86: Move INIT_RECEIVED vs. INIT/SIPI blocked check to KVM_RUN > KVM: x86: Refactor handling of SIPI_RECEIVED when setting MP_STATE > KVM: VMX: Add a macro to track which DEBUGCTL bits are host-owned > KVM: selftests: Expand set of APIs for pinning tasks to a single CPU > KVM: selftests: Convert arch_timer tests to common helpers to pin task > > Xin Li (1): > KVM: x86: Advertise support for LKGS > > Documentation/virt/kvm/api.rst | 25 +- > arch/x86/include/asm/kvm-x86-ops.h | 3 +- > arch/x86/include/asm/kvm_host.h | 22 +- > arch/x86/include/asm/msr-index.h | 1 + > arch/x86/kvm/cpuid.c | 1 + > arch/x86/kvm/lapic.h | 2 + > arch/x86/kvm/svm/nested.c | 128 ++++-- > arch/x86/kvm/svm/sev.c | 33 +- > arch/x86/kvm/svm/svm.c | 500 +++++++-------------- > arch/x86/kvm/svm/svm.h | 104 ++++- > arch/x86/kvm/vmx/common.h | 2 - > arch/x86/kvm/vmx/main.c | 23 +- > arch/x86/kvm/vmx/nested.c | 27 +- > arch/x86/kvm/vmx/pmu_intel.c | 8 +- > arch/x86/kvm/vmx/tdx.c | 24 +- > arch/x86/kvm/vmx/vmx.c | 284 ++++-------- > arch/x86/kvm/vmx/vmx.h | 61 ++- > arch/x86/kvm/vmx/x86_ops.h | 6 +- > arch/x86/kvm/x86.c | 106 +++-- > arch/x86/kvm/x86.h | 18 +- > include/uapi/linux/kvm.h | 1 + > tools/include/uapi/linux/kvm.h | 1 + > tools/testing/selftests/kvm/Makefile.kvm | 1 + > tools/testing/selftests/kvm/arch_timer.c | 7 +- > .../selftests/kvm/arm64/arch_timer_edge_cases.c | 23 +- > tools/testing/selftests/kvm/include/kvm_util.h | 31 +- > tools/testing/selftests/kvm/lib/kvm_util.c | 15 +- > tools/testing/selftests/kvm/lib/memstress.c | 2 +- > tools/testing/selftests/kvm/x86/aperfmperf_test.c | 213 +++++++++ > .../selftests/kvm/x86/userspace_msr_exit_test.c | 8 + > 30 files changed, 930 insertions(+), 750 deletions(-) > create mode 100644 tools/testing/selftests/kvm/x86/aperfmperf_test.c >