On Thu, 2025-06-05 at 16:01 +0300, kirill.shutemov@xxxxxxxxxxxxxxx wrote: > On Fri, May 23, 2025 at 03:00:56PM +0300, kirill.shutemov@xxxxxxxxxxxxxxx wrote: > > On Wed, May 14, 2025 at 12:00:17AM +0000, Huang, Kai wrote: > > > On Mon, 2025-05-12 at 12:55 +0300, Kirill A. Shutemov wrote: > > > > On Fri, May 09, 2025 at 09:25:58AM +0800, Yan Zhao wrote: > > > > > On Thu, May 08, 2025 at 04:23:56PM +0300, Kirill A. Shutemov wrote: > > > > > > On Tue, May 06, 2025 at 07:55:17PM +0800, Yan Zhao wrote: > > > > > > > On Fri, May 02, 2025 at 04:08:24PM +0300, Kirill A. Shutemov wrote: > > > > > > > > The functions kvm_x86_ops::link_external_spt() and > > > > > > > > kvm_x86_ops::set_external_spte() are used to assign new memory to a VM. > > > > > > > > When using TDX with Dynamic PAMT enabled, the assigned memory must be > > > > > > > > covered by PAMT. > > > > > > > > > > > > > > > > The new function kvm_x86_ops::phys_prepare() is called before > > > > > > > > link_external_spt() and set_external_spte() to ensure that the memory is > > > > > > > > ready to be assigned to the virtual machine. In the case of TDX, it > > > > > > > > makes sure that the memory is covered by PAMT. > > > > > > > > > > > > > > > > kvm_x86_ops::phys_prepare() is called in a context where struct kvm_vcpu > > > > > > > > is available, allowing the implementation to allocate memory from a > > > > > > > > per-VCPU pool. > > > > > > > > > > > > > > > Why not invoke phys_prepare() and phys_cleanup() in set_external_spte_present()? > > > > > > > Or in tdx_sept_set_private_spte()/tdx_sept_link_private_spt()? > > > > > > > > > > > > Because the memory pool we allocated from is per-vcpu and we lost access > > > > > > to vcpu by then. And not all callers provide vcpu. > > > > > Maybe we can get vcpu via kvm_get_running_vcpu(), as in [1]. > > > > > Then for callers not providing vcpu (where vcpu is NULL), we can use per-KVM > > > > > cache? > > > > > > > > Hm. I was not aware of kvm_get_running_vcpu(). Will play with it, thanks. > > > > > > I am not sure why per-vcpu cache matters. > > > > > > For non-leaf SEPT pages, AFAICT the "vcpu->arch.mmu_external_spt_cache" is just > > > an empty cache, and eventually __get_free_page() is used to allocate in: > > > > > > sp->external_spt = > > > kvm_mmu_memory_cache_alloc(&vcpu->arch.mmu_external_spt_cache); > > > > > > So why not we actually create a kmem_cache for it with an actual 'ctor', and we > > > can call tdx_alloc_page() in that. This makes sure when the "external_spt" is > > > allocated, the underneath PAMT entry is there. > > > > I looked closer to this and while it is good idea, but ctor in kmem_cache > > cannot fail which makes this approach not viable. > > > > I guess we can a constructor directly into struct kvm_mmu_memory_cache. > > Let me play with this. > > I failed to make it work. > > We need to have destructor paired with the constructor that would do > PAMT-aware freeing. And redirect all free paths to it. It requires > substantial rework. I don't think it worth the effort. > > Will do manual PAMT management for SPT in TDX code. Thanks for the effort. Maybe something below? diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h index db8f33e4de62..48732270bff0 100644 --- a/arch/x86/kvm/mmu/mmu_internal.h +++ b/arch/x86/kvm/mmu/mmu_internal.h @@ -164,8 +164,10 @@ static inline bool is_mirror_sp(const struct kvm_mmu_page *sp) return sp->role.is_mirror; } -static inline void kvm_mmu_alloc_external_spt(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp) +static inline int kvm_mmu_alloc_external_spt(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp) { + int r; + /* * external_spt is allocated for TDX module to hold private EPT mappings, * TDX module will initialize the page by itself. @@ -173,6 +175,12 @@ static inline void kvm_mmu_alloc_external_spt(struct kvm_vcpu *vcpu, struct kvm_ * KVM only interacts with sp->spt for private EPT operations. */ sp->external_spt = kvm_mmu_memory_cache_alloc(&vcpu- >arch.mmu_external_spt_cache); + + r = tdx_pamt_get(virt_to_page(sp->external_spt)); + if (r) + free_page((unsigned long)sp->external_spt); + + return r; } static inline gfn_t kvm_gfn_root_bits(const struct kvm *kvm, const struct kvm_mmu_page *root) diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 7f3d7229b2c1..2d3a716d9195 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -55,7 +55,10 @@ void kvm_mmu_uninit_tdp_mmu(struct kvm *kvm) static void tdp_mmu_free_sp(struct kvm_mmu_page *sp) { - free_page((unsigned long)sp->external_spt); + if (sp->external_spt) { + free_page((unsigned long)sp->external_spt); + tdx_pamt_put(virt_to_page(sp->external_spt)); + } free_page((unsigned long)sp->spt); kmem_cache_free(mmu_page_header_cache, sp); } @@ -1277,8 +1280,13 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) */ sp = tdp_mmu_alloc_sp(vcpu); tdp_mmu_init_child_sp(sp, &iter); - if (is_mirror_sp(sp)) - kvm_mmu_alloc_external_spt(vcpu, sp); + if (is_mirror_sp(sp)) { + r = kvm_mmu_alloc_external_spt(vcpu, sp); + if (r) { + tdp_mmu_free_sp(sp); + goto retry; + } + } sp->nx_huge_page_disallowed = fault->huge_page_disallowed;