On Thu, Mar 13, 2025 at 11:17 AM Adrian Hunter <adrian.hunter@xxxxxxxxx> wrote: > ... > == Problem == > > Currently, Dynamic Page Removal is being used when the TD is being > shutdown for the sake of having simpler initial code. > > This happens when guest_memfds are closed, refer kvm_gmem_release(). > guest_memfds hold a reference to struct kvm, so that VM destruction cannot > happen until after they are released, refer kvm_gmem_release(). > > Reclaiming TD Pages in TD_TEARDOWN State was seen to decrease the total > reclaim time. For example: > > VCPUs Size (GB) Before (secs) After (secs) > 4 18 72 24 > 32 107 517 134 If the time for reclaim grows linearly with memory size, then this is a significantly high value for TD cleanup (~21 minutes for a 1TB VM). > > Note, the V19 patch set: > > https://lore.kernel.org/all/cover.1708933498.git.isaku.yamahata@xxxxxxxxx/ > > did not have this issue because the HKID was released early, something that > Sean effectively NAK'ed: > > "No, the right answer is to not release the HKID until the VM is > destroyed." > > https://lore.kernel.org/all/ZN+1QHGa6ltpQxZn@xxxxxxxxxx/ IIUC, Sean is suggesting to treat S-EPT page removal and page reclaim separately. Through his proposal: 1) If userspace drops last reference on gmem inode before/after dropping the VM reference -> slow S-EPT removal and slow page reclaim 2) If memslots are removed before closing the gmem and dropping the VM reference -> slow S-EPT page removal and no page reclaim until the gmem is around. Reclaim should ideally happen when the host wants to use that memory i.e. for following scenarios: 1) Truncation of private guest_memfd ranges 2) Conversion of private guest_memfd ranges to shared when supporting in-place conversion (Could be deferred to the faulting in as shared as well). Would it be possible for you to provide the split of the time spent in slow S-EPT page removal vs page reclaim? It might be worth exploring the possibility of parallelizing or giving userspace the flexibility to parallelize both these operations to bring the cleanup time down (to be comparable with non-confidential VM cleanup time for example).