On Fri, Jun 20, 2025 at 7:24 AM Sean Christopherson <seanjc@xxxxxxxxxx> wrote: > > On Thu, Jun 19, 2025, Adrian Hunter wrote: > > On 19/06/2025 03:33, Sean Christopherson wrote: > > > On Wed, Jun 18, 2025, Adrian Hunter wrote: > > >> On 18/06/2025 09:00, Vishal Annapurve wrote: > > >>> On Tue, Jun 17, 2025 at 10:50 PM Adrian Hunter <adrian.hunter@xxxxxxxxx> wrote: > > >>>>> Ability to clean up memslots from userspace without closing > > >>>>> VM/guest_memfd handles is useful to keep reusing the same guest_memfds > > >>>>> for the next boot iteration of the VM in case of reboot. > > >>>> > > >>>> TD lifecycle does not include reboot. In other words, reboot is > > >>>> done by shutting down the TD and then starting again with a new TD. > > >>>> > > >>>> AFAIK it is not currently possible to shut down without closing > > >>>> guest_memfds since the guest_memfd holds a reference (users_count) > > >>>> to struct kvm, and destruction begins when users_count hits zero. > > >>>> > > >>> > > >>> gmem link support[1] allows associating existing guest_memfds with new > > >>> VM instances. > > >>> > > >>> Breakdown of the userspace VMM flow: > > >>> 1) Create a new VM instance before closing guest_memfd files. > > >>> 2) Link existing guest_memfd files with the new VM instance. -> This > > >>> creates new set of files backed by the same inode but associated with > > >>> the new VM instance. > > >> > > >> So what about: > > >> > > >> 2.5) Call KVM_TDX_TERMINATE_VM IOCTL > > >> > > >> Memory reclaimed after KVM_TDX_TERMINATE_VM will be done efficiently, > > >> so avoid causing it to be reclaimed earlier. > > > > > > The problem is that setting kvm->vm_dead will prevent (3) from succeeding. If > > > kvm->vm_dead is set, KVM will reject all vCPU, VM, and device (not /dev/kvm the > > > device, but rather devices bound to the VM) ioctls. > > > > (3) is "Close the older guest memfd handles -> results in older VM instance cleanup." > > > > close() is not an IOCTL, so I do not understand. > > Sorry, I misread that as "Close the older guest memfd handles by deleting the > memslots". > > > > I intended that behavior, e.g. to guard against userspace blowing up KVM because > > > the hkid was released, I just didn't consider the memslots angle. > > > > The patch was tested with QEMU which AFAICT does not touch memslots when > > shutting down. Is there a reason to? > > In this case, the VMM process is not shutting down. To emulate a reboot, the > VMM destroys the VM, but reuses the guest_memfd files for the "new" VM. Because > guest_memfd takes a reference to "struct kvm", through memslot bindings, memslots guest_memfd takes a reference on the "struct kvm" only on creation/linking, currently memslot binding doesn't add additional references. Adrian's suggestion makes sense and it should be functional but I am running into some issues which likely need to be resolved on the userspace side. I will keep this thread updated. Currently testing this reboot flow: 1) Issue KVM_TDX_TERMINATE_VM on the old VM. 2) Close the VM fd. 3) Create a new VM fd. 4) Link the old guest_memfd handles to the new VM fd. 5) Close the old guest_memfd handles. 6) Register memslots on the new VM using the linked guest_memfd handles. That being said, I still see the value in what Sean suggested. " Remove vm_dead and instead reject ioctls based on vm_bugged, and simply rely on KVM_REQ_VM_DEAD to prevent running the guest." This will help with: 1) Keeping the cleanup sequence as close as possible to the normal VM cleanup sequence. 2) Actual VM destruction happens at step 5 from the above mentioned flow, if there is any cleanup that happens asynchronously, userspace can enforce synchronous cleanup by executing graceful VM shutdown stages before step 2 above. And IIUC the goal here is to achieve exactly what Sean suggested above i.e. prevent running the guest after KVM_TDX_TERMINATE_VM is issued. > need to be manually destroyed so that all references are put and the VM is freed > by the kernel. E.g. otherwise multiple reboots would manifest as memory leakds > and eventually OOM the host.