On Mon, Jun 2, 2025 at 3:45 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote: > > Reject migration of SEV{-ES} state if either the source or destination VM > is actively creating a vCPU, i.e. if kvm_vm_ioctl_create_vcpu() is in the > section between incrementing created_vcpus and online_vcpus. The bulk of > vCPU creation runs _outside_ of kvm->lock to allow creating multiple vCPUs > in parallel, and so sev_info.es_active can get toggled from false=>true in > the destination VM after (or during) svm_vcpu_create(), resulting in an > SEV{-ES} VM effectively having a non-SEV{-ES} vCPU. > > The issue manifests most visibly as a crash when trying to free a vCPU's > NULL VMSA page in an SEV-ES VM, but any number of things can go wrong. > > BUG: unable to handle page fault for address: ffffebde00000000 > #PF: supervisor read access in kernel mode > #PF: error_code(0x0000) - not-present page > PGD 0 P4D 0 > Oops: Oops: 0000 [#1] SMP KASAN NOPTI > CPU: 227 UID: 0 PID: 64063 Comm: syz.5.60023 Tainted: G U O 6.15.0-smp-DEV #2 NONE > Tainted: [U]=USER, [O]=OOT_MODULE > Hardware name: Google, Inc. Arcadia_IT_80/Arcadia_IT_80, BIOS 12.52.0-0 10/28/2024 > RIP: 0010:constant_test_bit arch/x86/include/asm/bitops.h:206 [inline] > RIP: 0010:arch_test_bit arch/x86/include/asm/bitops.h:238 [inline] > RIP: 0010:_test_bit include/asm-generic/bitops/instrumented-non-atomic.h:142 [inline] > RIP: 0010:PageHead include/linux/page-flags.h:866 [inline] > RIP: 0010:___free_pages+0x3e/0x120 mm/page_alloc.c:5067 > Code: <49> f7 06 40 00 00 00 75 05 45 31 ff eb 0c 66 90 4c 89 f0 4c 39 f0 > RSP: 0018:ffff8984551978d0 EFLAGS: 00010246 > RAX: 0000777f80000001 RBX: 0000000000000000 RCX: ffffffff918aeb98 > RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffffebde00000000 > RBP: 0000000000000000 R08: ffffebde00000007 R09: 1ffffd7bc0000000 > R10: dffffc0000000000 R11: fffff97bc0000001 R12: dffffc0000000000 > R13: ffff8983e19751a8 R14: ffffebde00000000 R15: 1ffffd7bc0000000 > FS: 0000000000000000(0000) GS:ffff89ee661d3000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: ffffebde00000000 CR3: 000000793ceaa000 CR4: 0000000000350ef0 > DR0: 0000000000000000 DR1: 0000000000000b5f DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > Call Trace: > <TASK> > sev_free_vcpu+0x413/0x630 arch/x86/kvm/svm/sev.c:3169 > svm_vcpu_free+0x13a/0x2a0 arch/x86/kvm/svm/svm.c:1515 > kvm_arch_vcpu_destroy+0x6a/0x1d0 arch/x86/kvm/x86.c:12396 > kvm_vcpu_destroy virt/kvm/kvm_main.c:470 [inline] > kvm_destroy_vcpus+0xd1/0x300 virt/kvm/kvm_main.c:490 > kvm_arch_destroy_vm+0x636/0x820 arch/x86/kvm/x86.c:12895 > kvm_put_kvm+0xb8e/0xfb0 virt/kvm/kvm_main.c:1310 > kvm_vm_release+0x48/0x60 virt/kvm/kvm_main.c:1369 > __fput+0x3e4/0x9e0 fs/file_table.c:465 > task_work_run+0x1a9/0x220 kernel/task_work.c:227 > exit_task_work include/linux/task_work.h:40 [inline] > do_exit+0x7f0/0x25b0 kernel/exit.c:953 > do_group_exit+0x203/0x2d0 kernel/exit.c:1102 > get_signal+0x1357/0x1480 kernel/signal.c:3034 > arch_do_signal_or_restart+0x40/0x690 arch/x86/kernel/signal.c:337 > exit_to_user_mode_loop kernel/entry/common.c:111 [inline] > exit_to_user_mode_prepare include/linux/entry-common.h:329 [inline] > __syscall_exit_to_user_mode_work kernel/entry/common.c:207 [inline] > syscall_exit_to_user_mode+0x67/0xb0 kernel/entry/common.c:218 > do_syscall_64+0x7c/0x150 arch/x86/entry/syscall_64.c:100 > entry_SYSCALL_64_after_hwframe+0x76/0x7e > RIP: 0033:0x7f87a898e969 > </TASK> > Modules linked in: gq(O) > gsmi: Log Shutdown Reason 0x03 > CR2: ffffebde00000000 > ---[ end trace 0000000000000000 ]--- > > Deliberately don't check for a NULL VMSA when freeing the vCPU, as crashing > the host is likely desirable due to the VMSA being consumed by hardware. > E.g. if KVM manages to allow VMRUN on the vCPU, hardware may read/write a > bogus VMSA page. Accessing PFN 0 is "fine"-ish now that it's sequestered > away thanks to L1TF, but panicking in this scenario is preferable to > potentially running with corrupted state. > > Reported-by: Alexander Potapenko <glider@xxxxxxxxxx> > Tested-by: Alexander Potapenko <glider@xxxxxxxxxx> > Fixes: 0b020f5af092 ("KVM: SEV: Add support for SEV-ES intra host migration") > Fixes: b56639318bb2 ("KVM: SEV: Add support for SEV intra host migration") > Cc: stable@xxxxxxxxxxxxxxx > Cc: James Houghton <jthoughton@xxxxxxxxxx> > Cc: Peter Gonda <pgonda@xxxxxxxxxx> > Signed-off-by: Sean Christopherson <seanjc@xxxxxxxxxx> Thanks Sean! Free free to add: Reviewed-by: James Houghton <jthoughton@xxxxxxxxxx> > --- > arch/x86/kvm/svm/sev.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c > index a7a7dc507336..93d899454535 100644 > --- a/arch/x86/kvm/svm/sev.c > +++ b/arch/x86/kvm/svm/sev.c > @@ -2032,6 +2032,10 @@ static int sev_check_source_vcpus(struct kvm *dst, struct kvm *src) > struct kvm_vcpu *src_vcpu; > unsigned long i; > > + if (src->created_vcpus != atomic_read(&src->online_vcpus) || > + dst->created_vcpus != atomic_read(&dst->online_vcpus)) > + return -EINVAL; I think -EBUSY (or perhaps -EAGAIN) might be a more proper return code. > + > if (!sev_es_guest(src)) > return 0; > > -- > 2.49.0.1204.g71687c7c1d-goog >