On Thu, Apr 17, 2025, Adrian Hunter wrote: > From: Sean Christopherson <seanjc@xxxxxxxxxx> > > Add sub-ioctl KVM_TDX_TERMINATE_VM to release the HKID prior to shutdown, > which enables more efficient reclaim of private memory. > > Private memory is removed from MMU/TDP when guest_memfds are closed. If > the HKID has not been released, the TDX VM is still in RUNNABLE state, > so pages must be removed using "Dynamic Page Removal" procedure (refer > TDX Module Base spec) which involves a number of steps: > Block further address translation > Exit each VCPU > Clear Secure EPT entry > Flush/write-back/invalidate relevant caches > > However, when the HKID is released, the TDX VM moves to TD_TEARDOWN state > where all TDX VM pages are effectively unmapped, so pages can be reclaimed > directly. > > Reclaiming TD Pages in TD_TEARDOWN State was seen to decrease the total > reclaim time. For example: > > VCPUs Size (GB) Before (secs) After (secs) > 4 18 72 24 > 32 107 517 134 > 64 400 5539 467 > > [Adrian: wrote commit message, added KVM_TDX_TERMINATE_VM documentation, > and moved cpus_read_lock() inside kvm->lock for consistency as reported > by lockdep] /facepalm I over-thought this. We've had an long-standing battle with kvm_lock vs. cpus_read_lock(), but this is kvm->lock, not kvm_lock. /sigh > +static int tdx_terminate_vm(struct kvm *kvm) > +{ > + int r = 0; > + > + guard(mutex)(&kvm->lock); With kvm->lock taken outside cpus_read_lock(), just handle KVM_TDX_TERMINATE_VM in the switch statement, i.e. let tdx_vm_ioctl() deal with kvm->lock. > + cpus_read_lock(); > + > + if (!kvm_trylock_all_vcpus(kvm)) { > + r = -EBUSY; > + goto out; > + } > + > + kvm_vm_dead(kvm); > + kvm_unlock_all_vcpus(kvm); > + > + __tdx_release_hkid(kvm, true); > +out: > + cpus_read_unlock(); > + return r; > +} > + > int tdx_vm_ioctl(struct kvm *kvm, void __user *argp) > { > struct kvm_tdx_cmd tdx_cmd; > @@ -2805,6 +2827,9 @@ int tdx_vm_ioctl(struct kvm *kvm, void __user *argp) > if (tdx_cmd.hw_error) > return -EINVAL; > > + if (tdx_cmd.id == KVM_TDX_TERMINATE_VM) > + return tdx_terminate_vm(kvm); > + > mutex_lock(&kvm->lock); > > switch (tdx_cmd.id) { > -- > 2.43.0 >