Hi Joao, Sean, On 4/9/2025 5:26 PM, Joao Martins wrote: > On 04/04/2025 20:39, Sean Christopherson wrote: >> Add plumbing to the AMD IOMMU driver to allow KVM to control whether or >> not an IRTE is configured to generate GA log interrupts. KVM only needs a >> notification if the target vCPU is blocking, so the vCPU can be awakened. >> If a vCPU is preempted or exits to userspace, KVM clears is_run, but will >> set the vCPU back to running when userspace does KVM_RUN and/or the vCPU >> task is scheduled back in, i.e. KVM doesn't need a notification. >> >> Unconditionally pass "true" in all KVM paths to isolate the IOMMU changes >> from the KVM changes insofar as possible. >> >> Opportunistically swap the ordering of parameters for amd_iommu_update_ga() >> so that the match amd_iommu_activate_guest_mode(). > > Unfortunately I think this patch and the next one might be riding on the > assumption that amd_iommu_update_ga() is always cheap :( -- see below. > > I didn't spot anything else flawed in the series though, just this one. I would > suggest holding off on this and the next one, while progressing with the rest of > the series. > >> diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c >> index 2e016b98fa1b..27b03e718980 100644 >> --- a/drivers/iommu/amd/iommu.c >> +++ b/drivers/iommu/amd/iommu.c >> -static void __amd_iommu_update_ga(struct irte_ga *entry, int cpu) >> +static void __amd_iommu_update_ga(struct irte_ga *entry, int cpu, >> + bool ga_log_intr) >> { >> if (cpu >= 0) { >> entry->lo.fields_vapic.destination = >> @@ -3783,12 +3784,14 @@ static void __amd_iommu_update_ga(struct irte_ga *entry, int cpu) >> entry->hi.fields.destination = >> APICID_TO_IRTE_DEST_HI(cpu); >> entry->lo.fields_vapic.is_run = true; >> + entry->lo.fields_vapic.ga_log_intr = false; >> } else { >> entry->lo.fields_vapic.is_run = false; >> + entry->lo.fields_vapic.ga_log_intr = ga_log_intr; >> } >> } >> > > isRun, Destination and GATag are not cached. Quoting the update from a few years > back (page 93 of IOMMU spec dated Feb 2025): > > | When virtual interrupts are enabled by setting MMIO Offset 0018h[GAEn] and > | IRTE[GuestMode=1], IRTE[IsRun], IRTE[Destination], and if present IRTE[GATag], > | are not cached by the IOMMU. Modifications to these fields do not require an > | invalidation of the Interrupt Remapping Table. > > This is the reason we were able to get rid of the IOMMU invalidation in > amd_iommu_update_ga() ... which sped up vmexit/vmenter flow with iommu avic. > Besides the lock contention that was observed at the time, we were seeing stalls > in this path with enough vCPUs IIRC; CCing Alejandro to keep me honest. > > Now this change above is incorrect as is and to make it correct: you will need > xor with the previous content of the IRTE::ga_log_intr and then if it changes > then you re-add back an invalidation command via > iommu_flush_irt_and_complete()). The latter is what I am worried will > reintroduce these above problem :( > > The invalidation command (which has a completion barrier to serialize > invalidation execution) takes some time in h/w, and will make all your vcpus > content on the irq table lock (as is). Even assuming you somehow move the > invalidation outside the lock, you will content on the iommu lock (for the> command queue) or best case assuming no locks (which I am not sure it is > possible) you will need to wait for the command to complete until you can > progress forward with entering/exiting. > > Unless the GALogIntr bit is somehow also not cached too which wouldn't need the > invalidation command (which would be good news!). Adding Suravee/Vasant here. I have checked with HW architects. Its not cached. So we don't need invalidation after updating GALogIntr field in IRTE. -Vasant