On Wed, Jun 11, 2025 at 2:52 AM Adrian Hunter <adrian.hunter@xxxxxxxxx> wrote: > > From: Sean Christopherson <seanjc@xxxxxxxxxx> > > Add sub-ioctl KVM_TDX_TERMINATE_VM to release the HKID prior to shutdown, > which enables more efficient reclaim of private memory. > > Private memory is removed from MMU/TDP when guest_memfds are closed. If > the HKID has not been released, the TDX VM is still in RUNNABLE state, > so pages must be removed using "Dynamic Page Removal" procedure (refer > TDX Module Base spec) which involves a number of steps: > Block further address translation > Exit each VCPU > Clear Secure EPT entry > Flush/write-back/invalidate relevant caches > > However, when the HKID is released, the TDX VM moves to TD_TEARDOWN state > where all TDX VM pages are effectively unmapped, so pages can be reclaimed > directly. > > Reclaiming TD Pages in TD_TEARDOWN State was seen to decrease the total > reclaim time. For example: > > VCPUs Size (GB) Before (secs) After (secs) > 4 18 72 24 > 32 107 517 134 > 64 400 5539 467 > > Link: https://lore.kernel.org/r/Z-V0qyTn2bXdrPF7@xxxxxxxxxx > Link: https://lore.kernel.org/r/aAL4dT1pWG5dDDeo@xxxxxxxxxx > Signed-off-by: Sean Christopherson <seanjc@xxxxxxxxxx> > Co-developed-by: Adrian Hunter <adrian.hunter@xxxxxxxxx> > Signed-off-by: Adrian Hunter <adrian.hunter@xxxxxxxxx> > --- > > > Changes in V4: > > Drop TDX_FLUSHVP_NOT_DONE change. It will be done separately. > Use KVM_BUG_ON() instead of WARN_ON(). > Correct kvm_trylock_all_vcpus() return value. > > Changes in V3: > > Remove KVM_BUG_ON() from tdx_mmu_release_hkid() because it would > trigger on the error path from __tdx_td_init() > > Put cpus_read_lock() handling back into tdx_mmu_release_hkid() > > Handle KVM_TDX_TERMINATE_VM in the switch statement, i.e. let > tdx_vm_ioctl() deal with kvm->lock > .... > > +static int tdx_terminate_vm(struct kvm *kvm) > +{ > + if (kvm_trylock_all_vcpus(kvm)) > + return -EBUSY; > + > + kvm_vm_dead(kvm); With this no more VM ioctls can be issued on this instance. How would userspace VMM clean up the memslots? Is the expectation that guest_memfd and VM fds are closed to actually reclaim the memory? Ability to clean up memslots from userspace without closing VM/guest_memfd handles is useful to keep reusing the same guest_memfds for the next boot iteration of the VM in case of reboot. > + > + kvm_unlock_all_vcpus(kvm); > + > + tdx_mmu_release_hkid(kvm); > + > + return 0; > +} > +