Re: [RFC PATCH v1 1/5] x86/boot: Shift VMXON from KVM init to CPU startup phase

"Huang, Kai" <kai.huang@xxxxxxxxx> · Wed, 10 Sep 2025 11:35:35 +0000

On Wed, 2025-09-10 at 19:10 +0800, Chao Gao wrote:
> > > @@ -2551,6 +2636,12 @@ void __init arch_cpu_finalize_init(void)
> > >  	*c = boot_cpu_data;
> > >  	c->initialized = true;
> > >  
> > > 
> > > 
> > > 
> > > +	/*
> > > +	 * Enable BSP virtualization right after the BSP cpuinfo_x86 structure
> > > +	 * is initialized to ensure this_cpu_has() works as expected.
> > > +	 */
> > > +	cpu_enable_virtualization();
> > > +
> > > 
> > 
> > Any reason that you choose to do it in arch_cpu_finalize_init()?  Perhaps
> > just a arch_initcall() or similar?
> > 
> > KVM has a specific CPUHP_AP_KVM_ONLINE to handle VMXON/OFF for CPU
> > online/offline.  And it's not in STARTUP section (which is not allowed to
> > fail) so it can handle the failure of VMXON.
> > 
> > How about adding a VMX specific CPUHP callback instead?
> > 
> > In this way, not only we can put all VMX related code together (e.g.,
> > arch/x86/virt/vmx/vmx.c) which is way easier to review/maintain, but also
> > we can still handle the failure of VMXON just like in KVM.
> 
> KVM's policy is that a CPU can be online if there is no VM running. 
> 

This is when 'enable_virt_at_load' is off, right?  The default value is
true.

> It is hard
> to implement/move the same logic inside the core kernel because the core kernel
> would need to refcount the running VMs. Any idea/suggestion on how to handle
> VMXON failure in the core kernel?

Since I think doing VMXON when bringing up CPU unconditionally is a
dramatic move at this stage, I was actually thinking we don't do VMXON in
CPUHP callback, but only do prepare things like sanity check and VMXON
region setup etc.  If anything fails, we refuse to online CPU, or mark CPU
as VMX not supported, whatever.

The core kernel then provides two APIs to do VMXON/VMXOFF respectively,
and KVM can use them.  The APIs needs to handle concurrent requests from
multiple users, though.  VMCLEAR could still be in KVM since this is kinda
KVM's internal on how to manage vCPUs.

Does this make sense?