On 19/04/25 04:12, Sean Christopherson wrote: > On Thu, Apr 17, 2025, Adrian Hunter wrote: >> From: Sean Christopherson <seanjc@xxxxxxxxxx> >> >> Add sub-ioctl KVM_TDX_TERMINATE_VM to release the HKID prior to shutdown, >> which enables more efficient reclaim of private memory. >> >> Private memory is removed from MMU/TDP when guest_memfds are closed. If >> the HKID has not been released, the TDX VM is still in RUNNABLE state, >> so pages must be removed using "Dynamic Page Removal" procedure (refer >> TDX Module Base spec) which involves a number of steps: >> Block further address translation >> Exit each VCPU >> Clear Secure EPT entry >> Flush/write-back/invalidate relevant caches >> >> However, when the HKID is released, the TDX VM moves to TD_TEARDOWN state >> where all TDX VM pages are effectively unmapped, so pages can be reclaimed >> directly. >> >> Reclaiming TD Pages in TD_TEARDOWN State was seen to decrease the total >> reclaim time. For example: >> >> VCPUs Size (GB) Before (secs) After (secs) >> 4 18 72 24 >> 32 107 517 134 >> 64 400 5539 467 >> >> [Adrian: wrote commit message, added KVM_TDX_TERMINATE_VM documentation, >> and moved cpus_read_lock() inside kvm->lock for consistency as reported >> by lockdep] > > /facepalm > > I over-thought this. We've had an long-standing battle with kvm_lock vs. > cpus_read_lock(), but this is kvm->lock, not kvm_lock. /sigh > >> +static int tdx_terminate_vm(struct kvm *kvm) >> +{ >> + int r = 0; >> + >> + guard(mutex)(&kvm->lock); > > With kvm->lock taken outside cpus_read_lock(), just handle KVM_TDX_TERMINATE_VM > in the switch statement, i.e. let tdx_vm_ioctl() deal with kvm->lock. Ok, also cpus_read_lock() can go back where it was in __tdx_release_hkid(). But also in __tdx_release_hkid(), there is if (KVM_BUG_ON(refcount_read(&kvm->users_count) && !terminate, kvm)) return; However, __tdx_td_init() calls tdx_mmu_release_hkid() on the error path so that is not correct. > >> + cpus_read_lock(); >> + >> + if (!kvm_trylock_all_vcpus(kvm)) { >> + r = -EBUSY; >> + goto out; >> + } >> + >> + kvm_vm_dead(kvm); >> + kvm_unlock_all_vcpus(kvm); >> + >> + __tdx_release_hkid(kvm, true); >> +out: >> + cpus_read_unlock(); >> + return r; >> +} >> + >> int tdx_vm_ioctl(struct kvm *kvm, void __user *argp) >> { >> struct kvm_tdx_cmd tdx_cmd; >> @@ -2805,6 +2827,9 @@ int tdx_vm_ioctl(struct kvm *kvm, void __user *argp) >> if (tdx_cmd.hw_error) >> return -EINVAL; >> >> + if (tdx_cmd.id == KVM_TDX_TERMINATE_VM) >> + return tdx_terminate_vm(kvm); >> + >> mutex_lock(&kvm->lock); >> >> switch (tdx_cmd.id) { >> -- >> 2.43.0 >>