On 27/03/25 10:14, Vishal Annapurve wrote: > On Thu, Mar 13, 2025 at 11:17 AM Adrian Hunter <adrian.hunter@xxxxxxxxx> wrote: >> ... >> == Problem == >> >> Currently, Dynamic Page Removal is being used when the TD is being >> shutdown for the sake of having simpler initial code. >> >> This happens when guest_memfds are closed, refer kvm_gmem_release(). >> guest_memfds hold a reference to struct kvm, so that VM destruction cannot >> happen until after they are released, refer kvm_gmem_release(). >> >> Reclaiming TD Pages in TD_TEARDOWN State was seen to decrease the total >> reclaim time. For example: >> >> VCPUs Size (GB) Before (secs) After (secs) >> 4 18 72 24 >> 32 107 517 134 > > If the time for reclaim grows linearly with memory size, then this is > a significantly high value for TD cleanup (~21 minutes for a 1TB VM). > >> >> Note, the V19 patch set: >> >> https://lore.kernel.org/all/cover.1708933498.git.isaku.yamahata@xxxxxxxxx/ >> >> did not have this issue because the HKID was released early, something that >> Sean effectively NAK'ed: >> >> "No, the right answer is to not release the HKID until the VM is >> destroyed." >> >> https://lore.kernel.org/all/ZN+1QHGa6ltpQxZn@xxxxxxxxxx/ > > IIUC, Sean is suggesting to treat S-EPT page removal and page reclaim > separately. Through his proposal: Thanks for looking at this! It seems I am using the term "reclaim" wrongly. Sorry! I am talking about taking private memory away from the guest, not what happens to it subsequently. When the TDX VM is in "Runnable" state, taking private memory away is slow (slow S-EPT removal). When the TDX VM is in "Teardown" state, taking private memory away is faster (a TDX SEAMCALL named TDH.PHYMEM.PAGE.RECLAIM which is where I picked up the term "reclaim") Once guest memory is removed from S-EPT, further action is not needed to reclaim it. It belongs to KVM at that point. guest_memfd memory can be added directly to S-EPT. No intermediate state or step is used. Any guest_memfd memory not given to the MMU (S-EPT), can be freed directly if userspace/KVM wants to. Again there is no intermediate state or (reclaim) step. > 1) If userspace drops last reference on gmem inode before/after > dropping the VM reference > -> slow S-EPT removal and slow page reclaim Currently slow S-EPT removal happens when the file is released. > 2) If memslots are removed before closing the gmem and dropping the VM reference > -> slow S-EPT page removal and no page reclaim until the gmem is around. > > Reclaim should ideally happen when the host wants to use that memory > i.e. for following scenarios: > 1) Truncation of private guest_memfd ranges > 2) Conversion of private guest_memfd ranges to shared when supporting > in-place conversion (Could be deferred to the faulting in as shared as > well). > > Would it be possible for you to provide the split of the time spent in > slow S-EPT page removal vs page reclaim? Based on what I wrote above, all the time is spent removing pages from S-EPT. Greater that 99% of shutdown time is kvm_gmem_release(). > > It might be worth exploring the possibility of parallelizing or giving > userspace the flexibility to parallelize both these operations to > bring the cleanup time down (to be comparable with non-confidential VM > cleanup time for example).