On 8/19/2025 11:59 PM, Edgecombe, Rick P wrote:
On Tue, 2025-08-19 at 13:40 +0800, Binbin Wu wrote:
Currently, KVM TDX code filters out TSX (HLE or RTM) and WAITPKG using
tdx_clear_unsupported_cpuid(), which is sort of blacklist.
I am wondering if we could add another array, e.g., tdx_cpu_caps[], which is the
TDX version of kvm_cpu_caps[].
Using tdx_cpu_caps[] is a whitelist way.
We had something like this in some of the earlier revisions of the TDX CPUID
configuration.
For a new feature
- If the developer doesn't know anything about TDX, the bit just be added to
kvm_cpu_caps[].
- If the developer knows that the feature supported by both non-TDX VMs and TDs
(either the feature doesn't require any additional virtualization support or
the virtualization support is added for TDX), extend the macros to set the bit
both in kvm_cpu_caps[] and tdx_cpu_caps[].
- If there is a feature not supported by non-TDX VMs, but supported by TDs,
extend the macros to set the bit only in tdx_cpu_caps[].
So, tdx_cpu_caps[] could be used as the filter of configurable bits reported
to userspace.
In some ways this is the simplest, but having to maintain a big list in KVM was
not ideal.
Agree.
The original solution started with KVM_GET_SUPPORTED_CPUID and then
massaged the results to fit, so maybe just encoding the whole thing separately
is enough to reconsider it.
But what I was thinking is that we could most of that hardcoded list into the
TDX module, and only keep a list of non-trivial features (i.e. not simple
instruction CPUID bits) in KVM. The list of simple features (definition TBD)
could be provided by the TDX module.
It sounds like a good idea.
Either a list of simple features, or the opposite version is OK.
TDX module already provided the interface to get directly configurable bits.
VMM can get the other part by masking.
But providing a list of non-trivial features may be more direct.
I think non-trivial features should cover both cases:
- a feature clobbers host state
- a feature that requires additional para-virtualization support in VMM. E.g,
the feature related MSR(s) should be virtualized by VMM. Without proper
para-virtualization support in VMM, the guest will experience functionality
issue when using the feature
So KVM could do the full filtering but only
keep a list that today would just look like TSX and WAITPKG that we already
have. So basically the same as what you are proposing, but just shrinks the size
of list KVM has to keep.
Comparing to blacklist (i.e., tdx_clear_unsupported_cpuid()), there is no risk
that a feature not supported by TDX is forgotten to be added to the blacklist.
Also, tdx_cpu_caps[] could support a feature that not supported for non-TDX VMs.
We definitely can't have TDX module adding any host affecting features that we
would automatically allow. And having a separate opt-in interface that doesn't
"speak" cpuid bits is going to just complicate the already complicated logic
that is in QEMU.
With the list of non-trivial features, VMM can prevent userspace from setting
any bit in the list not supported by VMM.
So can KVM only enforce the consistency for non-trivial feature bits? After all,
these bits are really matters from KVM's view.
If letting userspace, KVM and TDX module have a consistent view of CPUIDs for a
TD is still a target. When a new fixed1 bit is added in a new TDX spec, it still
requires an opt-in interface to allow userspace to get the full picture. Also,
userspace doesn't know which opt-in options are available unless TDX module
provide another interface to report them... yeah, very complicated :(
Ideally, if TDX module never adds new fixed1 bit (including new defined and
converted from other types), or convert a fixed1 bit to fixed0 bit, then
userspace can calculate the right fixed1 bits based on the base spec and the
directly configurable bits without separate opt-in interface.
Then we don't need a host opt-in for these directly configurable bits not
clobbering host states.
Of course, to prevent userspace from setting feature bit that would clobber host
state, but not included in tdx_cpu_caps[], I think a new feature that would
clobber host state should requires a host opt-in to TDX module.
Yes, but if have some way to get the host clobbering type info programatically
we could keep the host opt-in as part of the main CPUID bit configuration. What
I think will be bad is if we grow a separate protocol of opt-ins. KVM and QEMU
manage everything with CPUID, so it will be easier if we stick to that.
Agree.