Re: [RFC PATCH v2 22/23] KVM: TDX: Handle Dynamic PAMT on page split

Yan Zhao <yan.y.zhao@xxxxxxxxx> · Mon, 18 Aug 2025 12:19:41 +0800

On Wed, Aug 13, 2025 at 10:31:27PM -0700, Vishal Annapurve wrote:
> On Thu, Aug 7, 2025 at 2:46 AM Yan Zhao <yan.y.zhao@xxxxxxxxx> wrote:
> >
> > From: "Kirill A. Shutemov" <kirill.shutemov@xxxxxxxxxxxxxxx>
> > +static struct page *tdx_alloc_pamt_page_split(void *data)
> > +{
> > +       struct kvm *kvm = data;
> > +       void *p;
> > +
> > +       p = kvm_mmu_memory_cache_alloc(&kvm->arch.pamt_page_cache);
> > +       return virt_to_page(p);
> > +}
> > +
> >  static int tdx_spte_demote_private_spte(struct kvm *kvm, gfn_t gfn,
> > -                                       enum pg_level level, struct page *page)
> > +                                       enum pg_level level, struct page *page,
> > +                                       kvm_pfn_t pfn_for_gfn)
> >  {
> >         int tdx_level = pg_level_to_tdx_sept_level(level);
> > +       hpa_t hpa = pfn_to_hpa(pfn_for_gfn);
> >         struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm);
> >         gpa_t gpa = gfn_to_gpa(gfn);
> >         u64 err, entry, level_state;
> > +       LIST_HEAD(pamt_pages);
> > +
> > +       tdx_pamt_get(page, PG_LEVEL_4K, tdx_alloc_pamt_page_split, kvm);
> 
> This invocation needs a return value check.
Ack.

> > +       tdx_alloc_pamt_pages(&pamt_pages, tdx_alloc_pamt_page_split, kvm);
> 
> IIUC tdx_pamt_get() will result in pamt_pages allocation above, so
> this step is not needed.

This step is to allocate pamt_pages for the guest 2MB page that needs splitting.
The above tdx_pamt_get() is for the EPT page to be added.
I'll add comments or update the param names for better clarity.

Regarding the absence of return value check for the tdx_alloc_pamt_pages(), I
think it's because the tdx_alloc_pamt_page_split() retrieves pages from the
pamt_page_cache via kvm_mmu_memory_cache_alloc(), which is guaranteed to succeed
(otherwise, there's a BUG_ON() in kvm_mmu_memory_cache_alloc()).

> >
> >         err = tdh_mem_page_demote(&kvm_tdx->td, gpa, tdx_level, page,
> > -                                 NULL, &entry, &level_state);
> > +                                 &pamt_pages, &entry, &level_state);
> >
> >         if (unlikely(tdx_operand_busy(err))) {
> >                 tdx_no_vcpus_enter_start(kvm);
> >                 err = tdh_mem_page_demote(&kvm_tdx->td, gpa, tdx_level, page,
> > -                                         NULL, &entry, &level_state);
> > +                                         &pamt_pages, &entry, &level_state);
> >                 tdx_no_vcpus_enter_stop(kvm);
> >         }
> >
> >         if (KVM_BUG_ON(err, kvm)) {
> > +               tdx_free_pamt_pages(&pamt_pages);
> 
> If tdx_alloc_pamt_pages() is not needed then this can be dropped as well.
> 
> > +               tdx_pamt_put(page, PG_LEVEL_4K);
> >                 pr_tdx_error_2(TDH_MEM_PAGE_DEMOTE, err, entry, level_state);
> >                 return -EIO;
> >         }
> > +
> > +       if (tdx_supports_dynamic_pamt(tdx_sysinfo))
> > +               atomic_set(tdx_get_pamt_refcount(hpa), PTRS_PER_PMD);
> 
> Should this be
> atomic_set(tdx_get_pamt_refcount(hpa), PTRS_PER_PMD -1 );
> 
> as tdx_pamt_get would have increased the refcount by 1 already above?
This hpa is for guest 2MB memory range. There shouldn't have any increased
pamt_refcount for this range before a successful demote.
So, atomic_set() to PTRS_PER_PMD looks correct, though atomic_add() seems even
safer.