On Tue, Sep 02, 2025, David Woodhouse wrote: > On Fri, 2025-08-29 at 13:36 -0700, Sean Christopherson wrote: > > > > > This does mean userspace would have to set the vCPU's TSC frequency and > > > then query the kernel before setting up its CPUID. And in the absence > > > of scaling, this KVM API would report the hardware TSC frequency. > > > > Reporting the hardware TSC frequency on CPUs without scaling seems all kinds of > > wrong (which another reason I don't like KVM shoving in the state). Of course, > > reporting the frequency KVM is trying to provide isn't great either, as the guest > > will definitely observe something in between those two. > > Yes, on CPUs that don't support TSC scaling, we should not attempt to > advertise a frequency. > > Where I said 'in the absence of scaling' I meant modern CPUs but where > the VMM just didn't ask for TSC scaling. > > > > I guess the API would have to return -EHARDWARETOOSTUPID if the TSC frequency > > > *isn't* the same across all CPUs and all power states, etc. > > > > What if KVM advertises the flag in KVM_GET_SUPPORTED_CPUID if and only if the > > TSC will be constant from the guest's perspective? TSC scaling has been supported > > by AMD and Intel for ~10 years, it doesn't seem at all unreasonable to restrict > > the feature to somewhat modern hardware. And if userspace or the admin knows > > better than KVM, then userspace can always ignore KVM and report the frequency > > anyways. > > I hadn't put it in KVM_GET_SUPPORTED_CPUID; I was following the lead of > the existing Xen leaf support, where *if* userspace provides that leaf, > KVM will dynamically correct the values in it. > > The problem is that KVM_GET_SUPPORTED_CPUID is a *system* ioctl on the > bare /dev/kvm device, isn 't it? Yep. > So even if a VMM has set the TSC frequency VM-wide with KVM_SET_TSC_KHZ > instead of doing it the old per- vCPU way, how can it get the results for a > specific VM? I don't see any need for userspace to query per-VM support. What I'm proposing is that KVM advertise the feature if the bare metal TSC is constant and the CPU supports TSC scaling. Beyond that, _KVM_ doesn't need to do anything to ensure the guest sees a constant frequency, it's userspace's responsibility to provide a sane configuration. And strictly speaking, CPUID is per-CPU, i.e. it's architecturally legal to set per-vCPU frequencies and then advertise a different frequency in CPUID for each vCPU. That's all but guaranteed to break guests as most/all kernels assume that TSC operates at the same frequency on all CPUs, but as above, that's userspace's responsibility to not screw up.