On Wed, Jul 9, 2025 at 6:30 PM Vishal Annapurve <vannapurve@xxxxxxxxxx> wrote: > > > 3) KVM should ideally associate the lifetime of backing > > > pagetables/protection tables/RMP tables with the lifetime of the > > > binding of memslots with guest_memfd. > > > > Again, please align your indentation. > > > > > - Today KVM SNP logic ties RMP table entry lifetimes with how > > > long the folios are mapped in guest_memfd, which I think should be > > > revisited. > > > > Why? Memslots are ephemeral per-"struct kvm" mappings. RMP entries and guest_memfd > > inodes are tied to the Virtual Machine, not to the "struct kvm" instance. > > IIUC guest_memfd can only be accessed through the window of memslots > and if there are no memslots I don't see the reason for memory still > being associated with "virtual machine". Likely because I am yet to > completely wrap my head around 'guest_memfd inodes are tied to the > Virtual Machine, not to the "struct kvm" instance', I need to spend > more time on this one. > I see the benefits of tying inodes to the virtual machine and different guest_memfd files to different KVM instances. This allows us to exercise intra-host migration usecases for TDX/SNP. But I think this model doesn't allow us to reuse guest_memfd files for SNP VMs during reboot. Reboot scenario assuming reuse of existing guest_memfd inode for the next instance: 1) Create a VM 2) Create guest_memfd files that pin KVM instance 3) Create memslots 4) Start the VM 5) For reboot/shutdown, Execute VM specific Termination (e.g. KVM_TDX_TERMINATE_VM) 6) if allowed, delete the memslots 7) Create a new VM instance 8) Link the existing guest_memfd files to the new VM -> which creates new files for the same inode. 9) Close the existing guest_memfd files and the existing VM 10) Jump to step 3 The difference between SNP and TDX is that TDX memory ownership is limited to the duration the pages are mapped in the second stage secure EPT tables, whereas SNP/RMP memory ownership lasts beyond memslots and effectively remains till folios are punched out from guest_memfd filemap. IIUC CCA might follow the suite of SNP in this regard with the pfns populated in GPT entries. I don't have a sense of how critical this problem could be, but this would mean for every reboot all large memory allocations will have to let go and need to be reallocated. For 1G support, we will be freeing guest_memfd pages using a background thread which may add some delays in being able to free up the memory in time. Instead if we did this: 1) Support creating guest_memfd files for a certain VM type that allows KVM to dictate the behavior of the guest_memfd. 2) Tie lifetime of KVM SNP/TDX memory ownership with guest_memfd and memslot bindings - Each binding will increase a refcount on both guest_memfd file and KVM, so both can't go away while the binding exists. 3) For SNP/CCA, pfns are invalidated from RMP/GPT tables during unbind operations while for TDX, KVM will invalidate secure EPT entries. This can allow us to decouple memory lifecycle from VM lifecycle and match the behavior with non-confidential VMs where memory can outlast VMs. Though this approach will mean change in intrahost migration implementation as we don't need to differentiate guest_memfd files and inodes. That being said, I might be missing something here and I don't have any data to back the criticality of this usecase for SNP and possibly CCA VMs.