>Oof. And as Chao pointed out[*], removing the vm_dead check would allow creating >and running vCPUs in a dead VM, which is most definitely not desirable. Squashing >the vCPU creation case is easy enough if we keep vm_dead but still generally allow >ioctls, and it's probably worth doing that no matter what (to plug the hole where >pending vCPU creations could succeed): > >diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c >index d477a7fda0ae..941d2c32b7dc 100644 >--- a/virt/kvm/kvm_main.c >+++ b/virt/kvm/kvm_main.c >@@ -4207,6 +4207,11 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, unsigned long id) > > mutex_lock(&kvm->lock); > >+ if (kvm->vm_dead) { >+ r = -EIO; >+ goto unlock_vcpu_destroy; >+ } >+ yes. this addresses my concern. > if (kvm_get_vcpu_by_id(kvm, id)) { > r = -EEXIST; > goto unlock_vcpu_destroy; > >And then to ensure vCPUs can't do anything, check KVM_REQ_VM_DEAD after acquiring >vcpu->mutex. > >diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c >index 6c07dd423458..883077eee4ce 100644 >--- a/virt/kvm/kvm_main.c >+++ b/virt/kvm/kvm_main.c >@@ -4433,6 +4433,12 @@ static long kvm_vcpu_ioctl(struct file *filp, > > if (mutex_lock_killable(&vcpu->mutex)) > return -EINTR; >+ >+ if (kvm_test_request(KVM_REQ_VM_DEAD, vcpu)) { >+ r = -EIO; >+ goto out; >+ } >+ > switch (ioctl) { > case KVM_RUN: { > struct pid *oldpid; > > >That should address all TDVPS paths (I hope), and I _think_ would address all >MMU-related paths as well? E.g. prefault requires a vCPU. > >Disallowing (most) vCPU ioctls but not all VM ioctls on vm_dead isn't great ABI >(understatement), but I think we need/want the above changes even if we keep the >general vm_dead restriction. And given the extremely ad hoc behavior of taking >kvm->lock for VM ioctls, trying to enforce vm_dead for "all" VM ioctls seems like >a fool's errand. > >So I'm leaning toward keeping "KVM: Reject ioctls only if the VM is bugged, not >simply marked dead" (with a different shortlog+changelog), but keeping vm_dead >(and not introducing kvm_tdx.vm_terminated). Sounds good to me. With kvm_tdx.vm_terminated removed, we should consider adding a comment above the is_hkid_assigned() check in tdx_sept_remove_private_spte() to clarify that !is_hkid_assigned() indicates the guest has been terminated, allowing private pages to be reclaimed directly without zapping.