On Mon, Apr 28, 2025 at 5:52 PM Yan Zhao <yan.y.zhao@xxxxxxxxx> wrote: > > On Mon, Apr 28, 2025 at 05:17:16PM -0700, Vishal Annapurve wrote: > > On Wed, Apr 23, 2025 at 8:07 PM Yan Zhao <yan.y.zhao@xxxxxxxxx> wrote: > > > > > > Increase folio ref count before mapping a private page, and decrease > > > folio ref count after a mapping failure or successfully removing a private > > > page. > > > > > > The folio ref count to inc/dec corresponds to the mapping/unmapping level, > > > ensuring the folio ref count remains balanced after entry splitting or > > > merging. > > > > > > Signed-off-by: Xiaoyao Li <xiaoyao.li@xxxxxxxxx> > > > Signed-off-by: Isaku Yamahata <isaku.yamahata@xxxxxxxxx> > > > Signed-off-by: Yan Zhao <yan.y.zhao@xxxxxxxxx> > > > --- > > > arch/x86/kvm/vmx/tdx.c | 19 ++++++++++--------- > > > 1 file changed, 10 insertions(+), 9 deletions(-) > > > > > > diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c > > > index 355b21fc169f..e23dce59fc72 100644 > > > --- a/arch/x86/kvm/vmx/tdx.c > > > +++ b/arch/x86/kvm/vmx/tdx.c > > > @@ -1501,9 +1501,9 @@ void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, int pgd_level) > > > td_vmcs_write64(to_tdx(vcpu), SHARED_EPT_POINTER, root_hpa); > > > } > > > > > > -static void tdx_unpin(struct kvm *kvm, struct page *page) > > > +static void tdx_unpin(struct kvm *kvm, struct page *page, int level) > > > { > > > - put_page(page); > > > + folio_put_refs(page_folio(page), KVM_PAGES_PER_HPAGE(level)); > > > } > > > > > > static int tdx_mem_page_aug(struct kvm *kvm, gfn_t gfn, > > > @@ -1517,13 +1517,13 @@ static int tdx_mem_page_aug(struct kvm *kvm, gfn_t gfn, > > > > > > err = tdh_mem_page_aug(&kvm_tdx->td, gpa, tdx_level, page, &entry, &level_state); > > > if (unlikely(tdx_operand_busy(err))) { > > > - tdx_unpin(kvm, page); > > > + tdx_unpin(kvm, page, level); > > > return -EBUSY; > > > } > > > > > > if (KVM_BUG_ON(err, kvm)) { > > > pr_tdx_error_2(TDH_MEM_PAGE_AUG, err, entry, level_state); > > > - tdx_unpin(kvm, page); > > > + tdx_unpin(kvm, page, level); > > > return -EIO; > > > } > > > > > > @@ -1570,10 +1570,11 @@ int tdx_sept_set_private_spte(struct kvm *kvm, gfn_t gfn, > > > * a_ops->migrate_folio (yet), no callback is triggered for KVM on page > > > * migration. Until guest_memfd supports page migration, prevent page > > > * migration. > > > - * TODO: Once guest_memfd introduces callback on page migration, > > > - * implement it and remove get_page/put_page(). > > > + * TODO: To support in-place-conversion in gmem in futre, remove > > > + * folio_ref_add()/folio_put_refs(). > > > > With necessary infrastructure in guest_memfd [1] to prevent page > > migration, is it necessary to acquire extra folio refcounts? If not, > > why not just cleanup this logic now? > Though the old comment says acquiring the lock is for page migration, the other > reason is to prevent the folio from being returned to the OS until it has been > successfully removed from TDX. > > If there's an error during the removal or reclaiming of a folio from TDX, such > as a failure in tdh_mem_page_remove()/tdh_phymem_page_wbinvd_hkid() or > tdh_phymem_page_reclaim(), it is important to hold the page refcount within TDX. > > So, we plan to remove folio_ref_add()/folio_put_refs() in future, only invoking > folio_ref_add() in the event of a removal failure. In my opinion, the above scheme can be deployed with this series itself. guest_memfd will not take away memory from TDX VMs without an invalidation. folio_ref_add() will not work for memory not backed by page structs, but that problem can be solved in future possibly by notifying guest_memfd of certain ranges being in use even after invalidation completes. > > > [1] https://git.kernel.org/pub/scm/virt/kvm/kvm.git/tree/virt/kvm/guest_memfd.c?h=kvm-coco-queue#n441 > > > > > Only increase the folio ref count > > > + * when there're errors during removing private pages. > > > */ > > > - get_page(page); > > > + folio_ref_add(page_folio(page), KVM_PAGES_PER_HPAGE(level)); > > > > > > /* > > > * Read 'pre_fault_allowed' before 'kvm_tdx->state'; see matching > > > @@ -1647,7 +1648,7 @@ static int tdx_sept_drop_private_spte(struct kvm *kvm, gfn_t gfn, > > > return -EIO; > > > > > > tdx_clear_page(page, level); > > > - tdx_unpin(kvm, page); > > > + tdx_unpin(kvm, page, level); > > > return 0; > > > } > > > > > > @@ -1727,7 +1728,7 @@ static int tdx_sept_zap_private_spte(struct kvm *kvm, gfn_t gfn, > > > if (tdx_is_sept_zap_err_due_to_premap(kvm_tdx, err, entry, level) && > > > !KVM_BUG_ON(!atomic64_read(&kvm_tdx->nr_premapped), kvm)) { > > > atomic64_dec(&kvm_tdx->nr_premapped); > > > - tdx_unpin(kvm, page); > > > + tdx_unpin(kvm, page, level); > > > return 0; > > > } > > > > > > -- > > > 2.43.2 > > >