Re: [RFC, PATCH 08/12] KVM: x86/tdp_mmu: Add phys_prepare() and phys_cleanup() to kvm_x86_ops

"kirill.shutemov@xxxxxxxxxxxxxxx" <kirill.shutemov@xxxxxxxxxxxxxxx> · Fri, 6 Jun 2025 13:20:40 +0300

On Thu, Jun 05, 2025 at 10:21:46PM +0000, Huang, Kai wrote:
> On Thu, 2025-06-05 at 16:01 +0300, kirill.shutemov@xxxxxxxxxxxxxxx wrote:
> > On Fri, May 23, 2025 at 03:00:56PM +0300, kirill.shutemov@xxxxxxxxxxxxxxx wrote:
> > > On Wed, May 14, 2025 at 12:00:17AM +0000, Huang, Kai wrote:
> > > > On Mon, 2025-05-12 at 12:55 +0300, Kirill A. Shutemov wrote:
> > > > > On Fri, May 09, 2025 at 09:25:58AM +0800, Yan Zhao wrote:
> > > > > > On Thu, May 08, 2025 at 04:23:56PM +0300, Kirill A. Shutemov wrote:
> > > > > > > On Tue, May 06, 2025 at 07:55:17PM +0800, Yan Zhao wrote:
> > > > > > > > On Fri, May 02, 2025 at 04:08:24PM +0300, Kirill A. Shutemov wrote:
> > > > > > > > > The functions kvm_x86_ops::link_external_spt() and
> > > > > > > > > kvm_x86_ops::set_external_spte() are used to assign new memory to a VM.
> > > > > > > > > When using TDX with Dynamic PAMT enabled, the assigned memory must be
> > > > > > > > > covered by PAMT.
> > > > > > > > > 
> > > > > > > > > The new function kvm_x86_ops::phys_prepare() is called before
> > > > > > > > > link_external_spt() and set_external_spte() to ensure that the memory is
> > > > > > > > > ready to be assigned to the virtual machine. In the case of TDX, it
> > > > > > > > > makes sure that the memory is covered by PAMT.
> > > > > > > > > 
> > > > > > > > > kvm_x86_ops::phys_prepare() is called in a context where struct kvm_vcpu
> > > > > > > > > is available, allowing the implementation to allocate memory from a
> > > > > > > > > per-VCPU pool.
> > > > > > > > > 
> > > > > > > > Why not invoke phys_prepare() and phys_cleanup() in set_external_spte_present()?
> > > > > > > > Or in tdx_sept_set_private_spte()/tdx_sept_link_private_spt()?
> > > > > > > 
> > > > > > > Because the memory pool we allocated from is per-vcpu and we lost access
> > > > > > > to vcpu by then. And not all callers provide vcpu.
> > > > > > Maybe we can get vcpu via kvm_get_running_vcpu(), as in [1].
> > > > > > Then for callers not providing vcpu (where vcpu is NULL), we can use per-KVM
> > > > > > cache? 
> > > > > 
> > > > > Hm. I was not aware of kvm_get_running_vcpu(). Will play with it, thanks.
> > > > 
> > > > I am not sure why per-vcpu cache matters.
> > > > 
> > > > For non-leaf SEPT pages, AFAICT the "vcpu->arch.mmu_external_spt_cache" is just
> > > > an empty cache, and eventually __get_free_page() is used to allocate in:
> > > >                                                                                             
> > > >   sp->external_spt = 
> > > > 	kvm_mmu_memory_cache_alloc(&vcpu->arch.mmu_external_spt_cache);
> > > > 
> > > > So why not we actually create a kmem_cache for it with an actual 'ctor', and we
> > > > can call tdx_alloc_page() in that.  This makes sure when the "external_spt" is
> > > > allocated, the underneath PAMT entry is there.
> > > 
> > > I looked closer to this and while it is good idea, but ctor in kmem_cache
> > > cannot fail which makes this approach not viable.
> > > 
> > > I guess we can a constructor directly into struct kvm_mmu_memory_cache.
> > > Let me play with this.
> > 
> > I failed to make it work.
> > 
> > We need to have destructor paired with the constructor that would do
> > PAMT-aware freeing. And redirect all free paths to it. It requires
> > substantial rework. I don't think it worth the effort.
> > 
> > Will do manual PAMT management for SPT in TDX code.
> 
> Thanks for the effort.
> 
> Maybe something below?

With help of kvm_get_running_vcpu(), I localized these manipulations to
the internals of TDX code. No need to leak this to TDP.

phys_prepare/cleanup() is gone now.

https://git.kernel.org/pub/scm/linux/kernel/git/kas/linux.git/commit/?h=tdx/dpamt-huge&id=72394699b5454aac6c027accab6d94a52d88819b

-- 
  Kiryl Shutsemau / Kirill A. Shutemov