On Tue, May 20, 2025, Kai Huang wrote: > On Mon, 2025-05-19 at 08:39 -0700, Sean Christopherson wrote: > > +static int tdx_sept_remove_private_spte(struct kvm *kvm, gfn_t gfn, > > + enum pg_level level, kvm_pfn_t pfn) > > { > > struct page *page = pfn_to_page(pfn); > > int ret; > > @@ -3507,10 +3507,14 @@ int __init tdx_bringup(void) > > r = __tdx_bringup(); > > if (r) { > > /* > > - * Disable TDX only but don't fail to load module if > > - * the TDX module could not be loaded. No need to print > > - * message saying "module is not loaded" because it was > > - * printed when the first SEAMCALL failed. > > + * Disable TDX only but don't fail to load module if the TDX > > + * module could not be loaded. No need to print message saying > > + * "module is not loaded" because it was printed when the first > > + * SEAMCALL failed. Don't bother unwinding the S-EPT hooks or > > + * vm_size, as kvm_x86_ops have already been finalized (and are > > + * intentionally not exported). The S-EPT code is unreachable, > > + * and allocating a few more bytes per VM in a should-be-rare > > + * failure scenario is a non-issue. > > */ > > if (r == -ENODEV) > > goto success_disable_tdx; > > @@ -3524,3 +3528,19 @@ int __init tdx_bringup(void) > > enable_tdx = 0; > > return 0; > > } > > + > > + > > +void __init tdx_hardware_setup(void) > > +{ > > + /* > > + * Note, if the TDX module can't be loaded, KVM TDX support will be > > + * disabled but KVM will continue loading (see tdx_bringup()). > > + */ > > This comment seems a little bit weird to me. I think what you meant here is the > @vm_size and those S-EPT ops are not unwound while TDX cannot be brought up but > KVM is still loaded. This comment is weird? Or the one in tdx_bringup() is weird? The sole intent of _this_ comment is to clarify that KVM could still end up running load with TDX disabled. The comment about not unwinding S-EPT resides in tdx_bringup(), because that's where the actual decision to not reject KVM load and to not undo the setup lives. > > + > > + vt_x86_ops.link_external_spt = tdx_sept_link_private_spt; > > + vt_x86_ops.set_external_spte = tdx_sept_set_private_spte; > > + vt_x86_ops.free_external_spt = tdx_sept_free_private_spt; > > + vt_x86_ops.remove_external_spte = tdx_sept_remove_private_spte; > > + vt_x86_ops.protected_apic_has_interrupt = tdx_protected_apic_has_interrupt; > > +} > > diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h > > index 51f98443e8a2..ca39a9391db1 100644 > > --- a/arch/x86/kvm/vmx/tdx.h > > +++ b/arch/x86/kvm/vmx/tdx.h > > @@ -8,6 +8,7 @@ > > #ifdef CONFIG_KVM_INTEL_TDX > > #include "common.h" > > > > +void tdx_hardware_setup(void); > > int tdx_bringup(void); > > void tdx_cleanup(void); > > > > There's a build error when CONFIG_KVM_INTEL_TDX is off: > > vmx/main.c: In function ‘vt_hardware_setup’: > vmx/main.c:34:17: error: implicit declaration of function ‘tdx_hardware_setup’; > did you mean ‘vmx_hardware_setup’? [-Wimplicit-function-declaration] > 34 | tdx_hardware_setup(); > | ^~~~~~~~~~~~~~~~~~ > | vmx_hardware_setup > > .. for which you need a stub for tdx_hardware_setup() when CONFIG_KVM_INTEL_TDX > is off. Not in kvm-x86/next, commit 907092bf7cbd ("KVM: VMX: Clean up and macrofy x86_ops") buried all of vt_hardware_setup() behind CONFIG_KVM_INTEL_TDX=y. > And one more thing: > > With the above patch, we still have below code in vt_init(): > > /* > * TDX and VMX have different vCPU structures. Calculate the > * maximum size/align so that kvm_init() can use the larger > * values to create the kmem_vcpu_cache. > */ > vcpu_size = sizeof(struct vcpu_vmx); > vcpu_align = __alignof__(struct vcpu_vmx); > if (enable_tdx) { > vcpu_size = max_t(unsigned, vcpu_size, > sizeof(struct vcpu_tdx)); > vcpu_align = max_t(unsigned, vcpu_align, > __alignof__(struct vcpu_tdx)); > kvm_caps.supported_vm_types |= BIT(KVM_X86_TDX_VM); > } > > It's kinda ugly too IMHO. > > Since we already have @vm_size in kvm_x86_ops, how about also adding vcpu_size > and vcpu_align to it? Then they can be treated in the same way as vm_size for > TDX. > > They are not needed for SVM, but it doesn't hurt that much? I'd rather not. vt_init() already needs to be aware of TDX, e.g. to call into tdx_bringup() in the first place. Shoving state into kvm_x86_ops that is only ever used in vt_init() (an __init function at that) isn't a net positive. Putting the fields in kvm_x86_init_ops would be better, but I still don't think the complexity and indirection is justified. Bleeding gory TDX details into the common code is undesirable, but I don't see the size of vcpu_tdx or the fact that enable_tdx==true means KVM_X86_TDX_VM is supported as being information with hiding. kvm_x86_ops.vm_size is a means to an end. E.g. if kvm_vcpu_cache didn't exist, then we'd probably want/need kvm_x86_ops.vcpu_size, but it does exist, so...