On Mon, Jun 23, 2025 at 09:18:45AM -0700, Sean Christopherson wrote: > On Mon, Jun 23, 2025, Naveen N Rao wrote: > > On Wed, Jun 11, 2025 at 03:45:23PM -0700, Sean Christopherson wrote: > > > Add a comment to explain why KVM clears IsRunning when putting a vCPU, > > > even though leaving IsRunning=1 would be ok from a functional perspective. > > > Per Maxim's experiments, a misbehaving VM could spam the AVIC doorbell so > > > fast as to induce a 50%+ loss in performance. > > > > > > Link: https://lore.kernel.org/all/8d7e0d0391df4efc7cb28557297eb2ec9904f1e5.camel@xxxxxxxxxx > > > Cc: Maxim Levitsky <mlevitsk@xxxxxxxxxx> > > > Signed-off-by: Sean Christopherson <seanjc@xxxxxxxxxx> > > > --- > > > arch/x86/kvm/svm/avic.c | 31 ++++++++++++++++++------------- > > > 1 file changed, 18 insertions(+), 13 deletions(-) > > > > > > diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c > > > index bf8b59556373..3cf929ac117f 100644 > > > --- a/arch/x86/kvm/svm/avic.c > > > +++ b/arch/x86/kvm/svm/avic.c > > > @@ -1121,19 +1121,24 @@ void avic_vcpu_blocking(struct kvm_vcpu *vcpu) > > > if (!kvm_vcpu_apicv_active(vcpu)) > > > return; > > > > > > - /* > > > - * Unload the AVIC when the vCPU is about to block, _before_ > > > - * the vCPU actually blocks. > > > - * > > > - * Any IRQs that arrive before IsRunning=0 will not cause an > > > - * incomplete IPI vmexit on the source, therefore vIRR will also > > > - * be checked by kvm_vcpu_check_block() before blocking. The > > > - * memory barrier implicit in set_current_state orders writing > > > - * IsRunning=0 before reading the vIRR. The processor needs a > > > - * matching memory barrier on interrupt delivery between writing > > > - * IRR and reading IsRunning; the lack of this barrier might be > > > - * the cause of errata #1235). > > > - */ > > > + /* > > > + * Unload the AVIC when the vCPU is about to block, _before_ the vCPU > > > + * actually blocks. > > > + * > > > + * Note, any IRQs that arrive before IsRunning=0 will not cause an > > > + * incomplete IPI vmexit on the source; kvm_vcpu_check_block() handles > > > + * this by checking vIRR one last time before blocking. The memory > > > + * barrier implicit in set_current_state orders writing IsRunning=0 > > > + * before reading the vIRR. The processor needs a matching memory > > > + * barrier on interrupt delivery between writing IRR and reading > > > + * IsRunning; the lack of this barrier might be the cause of errata #1235). > > > + * > > > + * Clear IsRunning=0 even if guest IRQs are disabled, i.e. even if KVM > > > + * doesn't need to detect events for scheduling purposes. The doorbell > > > > Nit: just IsRunning (you can drop the =0 part). > > Hmm, not really. It could be: > > /* Note, any IRQs that arrive while IsRunning is set will not cause an > > or > > /* Note, any IRQs that arrive while IsRunning=1 will not cause an > > but that's just regurgitating the spec. The slightly more interesting scenario > that's being described here is what will happen if an IRQ arrives _just_ before > the below code toggle IsRunning from 1 => 0. Oh, you're right. I was referring to the last paragraph starting with "Clear IsRunning=0 even if", which to me feels like a double negation. It reads better if it is "Clear IsRunning even if ..." > > > Trying to understand the significance of IRQs being disabled here. Is > > that a path KVM tries to optimize? > > Yep. KVM doesn't need a notification for the undelivered (virtual) IRQ, because > it won't be handled by the vCPU until the vCPU enables IRQs, and thus it's not a > valid wake event for the vCPU. > > So, *if* spurious doorbells didn't affect performance or functionality, then > ideally KVM would leave IsRunning=1, e.g. so that the IOMMU doesn't need to > generate GA log events, and so that other vCPUs aren't forced to VM-Exit when > sending an IPI. Unfortunately, spurious doorbells are quite intrusive, and so > KVM "needs" to clear IsRunning. Ok, got that. kvm_arch_vcpu_blocking() looks like it exists precisely for this reason. Thanks, Naveen