On Tue, Jun 17, 2025 at 09:10:10AM -0700, Sean Christopherson wrote: > On Tue, Jun 17, 2025, Naveen N Rao wrote: > > On Wed, Jun 11, 2025 at 03:45:15PM -0700, Sean Christopherson wrote: > > > diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c > > > index ab228872a19b..f0a74b102c57 100644 > > > --- a/arch/x86/kvm/svm/avic.c > > > +++ b/arch/x86/kvm/svm/avic.c > > > @@ -277,9 +277,19 @@ static int avic_init_backing_page(struct kvm_vcpu *vcpu) > > > int id = vcpu->vcpu_id; > > > struct vcpu_svm *svm = to_svm(vcpu); > > > > > > + /* > > > + * Inhibit AVIC if the vCPU ID is bigger than what is supported by AVIC > > > + * hardware. Immediately clear apicv_active, i.e. don't wait until the > > > + * KVM_REQ_APICV_UPDATE request is processed on the first KVM_RUN, as > > > + * avic_vcpu_load() expects to be called if and only if the vCPU has > > > + * fully initialized AVIC. > > > + */ > > > if ((!x2avic_enabled && id > AVIC_MAX_PHYSICAL_ID) || > > > - (id > X2AVIC_MAX_PHYSICAL_ID)) > > > - return -EINVAL; > > > + (id > X2AVIC_MAX_PHYSICAL_ID)) { > > > + kvm_set_apicv_inhibit(vcpu->kvm, APICV_INHIBIT_REASON_PHYSICAL_ID_TOO_BIG); > > > + vcpu->arch.apic->apicv_active = false; > > > > This bothers me a bit. kvm_create_lapic() does this: > > if (enable_apicv) { > > apic->apicv_active = true; > > kvm_make_request(KVM_REQ_APICV_UPDATE, vcpu); > > } > > > > But, setting apic->apicv_active to false here means KVM_REQ_APICV_UPDATE > > is going to be a no-op. > > That's fine, KVM_REQ_APICV_UPDATE is a nop in other scenarios, too. I agree it's > not ideal, but this is a rather extreme edge case and a one-time slow path, so I > don't think it's worth doing anything special just to avoid KVM_REQ_APICV_UPDATE. > > > This does not look to be a big deal given that kvm_create_lapic() > > is called just a bit before svm_vcpu_create() (which calls the above > > function through avic_init_vcpu()) in kvm_arch_vcpu_create(), so there > > isn't that much done before apicv_active is toggled. > > > > But, this made me wonder if introducing a kvm_x86_op to check and > > enable/disable apic->apicv_active in kvm_create_lapic() might be cleaner > > overall > > Hmm, yes and no. I completely agree that clearing apicv_active in avic.c is all > kinds of gross, but clearing apic->apicv_active in lapic.c to handle this particular > scenario is just as problematic, because then avic_init_backing_page() would need > to check kvm_vcpu_apicv_active() to determine whether or not to allocate the backing > page. In a way, that's even worse, because setting apic->apicv_active by default > is purely an optimization, i.e. leaving it %false _should_ work as well, it would > just be suboptimal. But if AVIC were to key off apic->apicv_active, > that could > lead to KVM incorrectly skipping allocation of the AVIC backing page. I understand your concern about key'ing off apic->apicv_active - that would definitely require a thorough audit and does add complexity to this. However, as far as I can see, after your current series, we no longer maintain a pointer to the AVIC backing page, but instead rely on the lapic-allocated page. Were you referring to the APIC access page though? That is behind kvm_apicv_activated() today, which looks to be problematic if there are inhibits set during vcpu_create() and if those can be unset later? Shouldn't we be allocating the apic access page unconditionally here? > > > Maybe even have it be the initialization point for APICv: > > apicv_init(), so we can invoke avic_init_vcpu() right away? > > I mostly like this idea (actually, I take that back; see below), but VMX throws > a big wrench in things. Unlike SVM, VMX doesn't have a singular "enable APICv" > flag. Rather, KVM considers "APICv" to be the combination of APIC-register > virtualization, virtual interrupt delivery, and posted interrupts. > > Which is totally fine. The big wrench is that the are other features that interact > with "APICv" and require allocations. E.g. the APIC access page isn't actually > tied to enable_apicv, it's tied to yet another VMX feature, "virtualize APIC > accesses" (not to be confused with APIC-register virtualization; don't blame me, > I didn't create the control knobs/names). > > As a result, KVM may need to allocate the APIC access page (not to be confused > with the APIC *backing* page; again, don't blame me :-D) when enable_apicv=false, > and even more confusingly, NOT allocate the APIC access page when enable_apicv=true. > > if (cpu_need_virtualize_apic_accesses(vcpu)) { <=== not an enable_apic check, *sigh* > err = kvm_alloc_apic_access_page(vcpu->kvm); > if (err) > goto free_vmcs; > } Thanks for the background and the details there -- I am going to need some time to go unpack all of that :) > > And for both SVM and VMX, IPI virtualization adds another wrinkle, in that the > per-vCPU setup needs to fill an entry in a per-VM table. And filling that entry > needs to happen at the very end of vCPU creation, e.g. so that the vCPU can't be > targeted until KVM is ready to run the vCPU. > > Ouch. And I'm pretty sure there's a use-after-free in the AVIC code. If > svm_vcpu_alloc_msrpm() fails, the avic_physical_id_table[] will still have a pointer > to freed vAPIC page. Oops. > > Thus, invoking avic_init_vcpu() "right away" is unfortunately flat out unsafe > (took me a while to realize that). > > So while I 100% agree with your dislike of this patch, I think it's the least > awful band-aid, at least in the short term. > > Longer term, I'd definitely be in favor of cleaning up the related flows, but I > think that's best done on top of this series, because I suspect it'll be somewhat > invasive. Sure, that makes sense. For this patch: Acked-by: Naveen N Rao (AMD) <naveen@xxxxxxxxxx> Thanks, Naveen