On Thu, Aug 21, 2025 at 01:38:17PM -0700, Sean Christopherson wrote: > On Tue, Aug 19, 2025, Maciej S. Szmigiero wrote: > > From: "Maciej S. Szmigiero" <maciej.szmigiero@xxxxxxxxxx> > > > > When AVIC is enabled the normal pre-VMRUN sync in sync_lapic_to_cr8() is > > inhibited so any changed TPR in the LAPIC state would not get copied into > > the V_TPR field of VMCB. > > > > AVIC does sync between these two fields, however it does so only on > > explicit guest writes to one of these fields, not on a bare VMRUN. > > > > This is especially true when it is the userspace setting LAPIC state via > > KVM_SET_LAPIC ioctl() since userspace does not have access to the guest > > VMCB. > > > > Practice shows that it is the V_TPR that is actually used by the AVIC to > > decide whether to issue pending interrupts to the CPU (not TPR in TASKPRI), > > so any leftover value in V_TPR will cause serious interrupt delivery issues > > in the guest when AVIC is enabled. > > > > Fix this issue by explicitly copying LAPIC TPR to VMCB::V_TPR in > > avic_apicv_post_state_restore(), which gets called from KVM_SET_LAPIC and > > similar code paths when AVIC is enabled. > > > > Fixes: 3bbf3565f48c ("svm: Do not intercept CR8 when enable AVIC") > > Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@xxxxxxxxxx> > > --- > > arch/x86/kvm/svm/avic.c | 23 +++++++++++++++++++++++ > > 1 file changed, 23 insertions(+) > > > > diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c > > index a34c5c3b164e..877bc3db2c6e 100644 > > --- a/arch/x86/kvm/svm/avic.c > > +++ b/arch/x86/kvm/svm/avic.c > > @@ -725,8 +725,31 @@ int avic_init_vcpu(struct vcpu_svm *svm) > > > > void avic_apicv_post_state_restore(struct kvm_vcpu *vcpu) > > { > > + struct vcpu_svm *svm = to_svm(vcpu); > > + u64 cr8; > > + > > avic_handle_dfr_update(vcpu); > > avic_handle_ldr_update(vcpu); > > + > > + /* Running nested should have inhibited AVIC. */ > > + if (WARN_ON_ONCE(nested_svm_virtualize_tpr(vcpu))) > > + return; > > > > + > > + /* > > + * Sync TPR from LAPIC TASKPRI into V_TPR field of the VMCB. > > + * > > + * When AVIC is enabled the normal pre-VMRUN sync in sync_lapic_to_cr8() > > + * is inhibited so any set TPR LAPIC state would not get reflected > > + * in V_TPR. > > Hmm, I think that code is straight up wrong. There's no justification, just a > claim: > > commit 3bbf3565f48ce3999b5a12cde946f81bd4475312 > Author: Suravee Suthikulpanit <Suravee.Suthikulpanit@xxxxxxx> > AuthorDate: Wed May 4 14:09:51 2016 -0500 > Commit: Paolo Bonzini <pbonzini@xxxxxxxxxx> > CommitDate: Wed May 18 18:04:31 2016 +0200 > > svm: Do not intercept CR8 when enable AVIC > > When enable AVIC: > * Do not intercept CR8 since this should be handled by AVIC HW. > * Also, we don't need to sync cr8/V_TPR and APIC backing page. <====== > > Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@xxxxxxx> > [Rename svm_in_nested_interrupt_shadow to svm_nested_virtualize_tpr. - Paolo] > Signed-off-by: Paolo Bonzini <pbonzini@xxxxxxxxxx> > > That claim assumes APIC[TPR] will _never_ be modified by anything other than > hardware. It also isn't clear to me why only sync_lapic_to_cr8() was gated when AVIC was enabled, while sync_cr8_to_lapic() continues to copy V_TRP to the backing page. If AVIC is enabled, then the AVIC hardware updates both the backing page and V_TPR on a guest write to TPR. > That's obviously false for state restore from userspace, and it's also > technically false at steady state, e.g. if KVM managed to trigger emulation of a > store to the APIC page, then KVM would bypass the automatic harware sync. Do you mean emulation due to AVIC being inhibited? I initially thought this could be a problem, but in this scenario, AVIC would be disabled on the next VMRUN, so we will end up sync'ing TPR from the lapic to V_TPR. > > There's also the comically ancient KVM_SET_VAPIC_ADDR, which AFAICT appears to > be largely dead code with respect to vTPR (nothing sets KVM_APIC_CHECK_VAPIC > except for the initial ioctl), but could again set APIC[TPR] without updating > V_TPR. > > So, rather than manually do the update during state restore, my vote > is to restore the sync logic. And if we want to optimize that code > (seems unnecessary), then we should hook all TPR writes. I guess you mean apic_set_tpr()? We will need to hook into that in addition to updating avic_apicv_post_state_restore() since KVM_SET_LAPIC just does a memcpy of the register state. > > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c > index d9931c6c4bc6..1bfebe40854f 100644 > --- a/arch/x86/kvm/svm/svm.c > +++ b/arch/x86/kvm/svm/svm.c > @@ -4046,8 +4046,7 @@ static inline void sync_lapic_to_cr8(struct kvm_vcpu *vcpu) > struct vcpu_svm *svm = to_svm(vcpu); > u64 cr8; > > - if (nested_svm_virtualize_tpr(vcpu) || > - kvm_vcpu_apicv_active(vcpu)) > + if (nested_svm_virtualize_tpr(vcpu)) > return; > > cr8 = kvm_get_cr8(vcpu); I agree that this is a simpler fix, so would be good to do for backport ease. The code in sync_lapic_to_cr8 ends up being a function call to kvm_get_cr8() and ~6 instructions, which isn't that much. But if we can gate sync'ing V_TPR to the backing page in sync_cr8_to_lapic() as well, then it might be good to do so. - Naveen