On Fri, Jul 11, 2025 at 2:18 PM Vishal Annapurve <vannapurve@xxxxxxxxxx> wrote: > > On Wed, Jul 9, 2025 at 6:30 PM Vishal Annapurve <vannapurve@xxxxxxxxxx> wrote: > > > > 3) KVM should ideally associate the lifetime of backing > > > > pagetables/protection tables/RMP tables with the lifetime of the > > > > binding of memslots with guest_memfd. > > > > > > Again, please align your indentation. > > > > > > > - Today KVM SNP logic ties RMP table entry lifetimes with how > > > > long the folios are mapped in guest_memfd, which I think should be > > > > revisited. > > > > > > Why? Memslots are ephemeral per-"struct kvm" mappings. RMP entries and guest_memfd > > > inodes are tied to the Virtual Machine, not to the "struct kvm" instance. > > > > IIUC guest_memfd can only be accessed through the window of memslots > > and if there are no memslots I don't see the reason for memory still > > being associated with "virtual machine". Likely because I am yet to > > completely wrap my head around 'guest_memfd inodes are tied to the > > Virtual Machine, not to the "struct kvm" instance', I need to spend > > more time on this one. > > > > I see the benefits of tying inodes to the virtual machine and > different guest_memfd files to different KVM instances. This allows us > to exercise intra-host migration usecases for TDX/SNP. But I think > this model doesn't allow us to reuse guest_memfd files for SNP VMs > during reboot. > > Reboot scenario assuming reuse of existing guest_memfd inode for the > next instance: > 1) Create a VM > 2) Create guest_memfd files that pin KVM instance > 3) Create memslots > 4) Start the VM > 5) For reboot/shutdown, Execute VM specific Termination (e.g. > KVM_TDX_TERMINATE_VM) > 6) if allowed, delete the memslots > 7) Create a new VM instance > 8) Link the existing guest_memfd files to the new VM -> which creates > new files for the same inode. > 9) Close the existing guest_memfd files and the existing VM > 10) Jump to step 3 > > The difference between SNP and TDX is that TDX memory ownership is > limited to the duration the pages are mapped in the second stage > secure EPT tables, whereas SNP/RMP memory ownership lasts beyond > memslots and effectively remains till folios are punched out from > guest_memfd filemap. IIUC CCA might follow the suite of SNP in this > regard with the pfns populated in GPT entries. > > I don't have a sense of how critical this problem could be, but this > would mean for every reboot all large memory allocations will have to > let go and need to be reallocated. For 1G support, we will be freeing > guest_memfd pages using a background thread which may add some delays > in being able to free up the memory in time. > > Instead if we did this: > 1) Support creating guest_memfd files for a certain VM type that > allows KVM to dictate the behavior of the guest_memfd. > 2) Tie lifetime of KVM SNP/TDX memory ownership with guest_memfd and > memslot bindings > - Each binding will increase a refcount on both guest_memfd file > and KVM, so both can't go away while the binding exists. I think if we can ensure that any guest_memfd initiated interaction with KVM is only for invalidation and is based on binding and under filemap_invalidate_lock then there is no need to pin KVM on each binding, as binding/unbinding should be protected using filemap_invalidate_lock and so KVM can't go away during invalidation. > 3) For SNP/CCA, pfns are invalidated from RMP/GPT tables during unbind > operations while for TDX, KVM will invalidate secure EPT entries. > > This can allow us to decouple memory lifecycle from VM lifecycle and > match the behavior with non-confidential VMs where memory can outlast > VMs. Though this approach will mean change in intrahost migration > implementation as we don't need to differentiate guest_memfd files and > inodes. > > That being said, I might be missing something here and I don't have > any data to back the criticality of this usecase for SNP and possibly > CCA VMs.