Re: [PATCHv2 00/12] TDX: Enable Dynamic PAMT

Vishal Annapurve <vannapurve@xxxxxxxxxx> · Tue, 12 Aug 2025 15:00:43 -0700

> On Tue, Aug 12, 2025 at 11:39 AM Edgecombe, Rick P <rick.p.edgecombe@xxxxxxxxx> wrote:
>
> On Tue, 2025-08-12 at 09:15 -0700, Sean Christopherson wrote:
> > > I actually went down this path too, but the problem I hit was that TDX
> > > module wants the PAMT page size to match the S-EPT page size.
> >
> > Right, but over-populating the PAMT would just result in "wasted" memory,
> > correct? I.e. KVM can always provide more PAMT entries than are needed.  Or am
> > I misunderstanding how dynamic PAMT works?
>
> Demote needs DPAMT pages in order to split the DPAMT. But "needs" is what I was
> hoping to understand better.
>
> I do think though, that we should consider premature optimization vs re-
> architecting DPAMT only for the sake of a short term KVM design. As in, if fault
> path managed DPAMT is better for the whole lazy accept way of things, it
> probably makes more sense to just do it upfront with the existing architecture.
>
> BTW, I think I untangled the fault path DPAMT page allocation code in this
> series. I basically moved the existing external page cache allocation to
> kvm/vmx/tdx.c. So the details of the top up and external page table cache
> happens outside of x86 mmu code. The top up structure comes from arch/x86 side
> of tdx code, so the cache can just be passed into tdx_pamt_get(). And from the
> MMU code's perspective there is just one type "external page tables". It doesn't
> know about DPAMT at all.
>
> So if that ends up acceptable, I think the main problem left is just this global
> lock. And it seems we have a simple solution for it if needed.
>
> >
> > In other words, IMO, reclaiming PAMT pages on-demand is also a premature
> > optimization of sorts, as it's not obvious to me that the host would actually
> > be able to take advantage of the unused memory.
>
> I was imagining some guestmemfd callback to setup DPAMT backing for all the
> private memory. Just leave it when it's shared for simplicity. Then cleanup
> DPAMT when the pages are freed from guestmemfd. The control pages could have
> their own path like it does in this series. But it doesn't seem supported.

IMO, tieing lifetime of guest_memfd folios with that of KVM ownership
beyond the memslot lifetime is leaking more state into guest_memfd
than needed. e.g. This will prevent usecases where guest_memfd needs
to be reused while handling reboot of a confidential VM [1].

IMO, if avoidable, its better to not have DPAMT or generally other KVM
arch specific state tracking hooked up to guest memfd folios specially
with hugepage support and whole folio splitting/merging that needs to
be done. If you still need it, guest_memfd should be stateless as much
as possible just like we are pushing for SNP preparation tracking [2]
to happen within KVM SNP and IMO any such tracking should ideally be
cleaned up on memslot unbinding.

[1] https://lore.kernel.org/kvm/CAGtprH9NbCPSwZrQAUzFw=4rZPA60QBM2G8opYo9CZxRiYihzg@xxxxxxxxxxxxxx/
[2] https://lore.kernel.org/kvm/20250613005400.3694904-2-michael.roth@xxxxxxx/