On 7/29/2025 8:59 PM, Jason Gunthorpe wrote:
On Tue, Jul 29, 2025 at 02:16:43PM +0800, Ethan Zhao wrote:
On 7/28/2025 12:20 AM, Jason Gunthorpe wrote:
On Sun, Jul 27, 2025 at 08:48:26PM +0800, Ethan Zhao wrote:
At least, we can do some attempt in DPC and Hot-plug driver, and then
push the hardware specification update to provide pre-reset notification for
DPC & hotplug. does it make sense ?
I think DPC is a different case..
More complex and practical case.
I'm not sure about that, we do FLRs all the time as a normal part of
VFIO and VMM operations. DPC is pretty rare, IMHO.
DPC reset could be triggered by simply accessing its control bit, that
is boring, while data corruption hardware issue is really rare. >
If we get a DPC we should also push the iommu into blocking, disable
ATS and abandon any outstanding ATC invalidations as part of
recovering from the DPC. Once everythings is cleaned up we can set the
Yup, even pure software resets, there might be ATC invalidation pending
(in software queue or HW queue).
The design of this patch series will require the iommu driver to wait
for the in-flight ATC invalidations during the blocking domain
I see there is pci_wait_for_pending_transaction() before the blocking
domain attachment.> attach. So for the SW initiated resets there should
not be pending ATC
invalidations when the FLR is triggered.
We have been talking about DPC internally, and I think it will need a
related, but different flow since DPC can unavoidably trigger ATC
invalidation timeouts/failures and we must sensibly handle them in the
There is race window for software to handle.
And for DPC containing data corruption as priority, seems not rational
to issue notification to software and then do resetting. alternative
way might be async modal support in iommu ATC invalidation path ?
Thanks,
Ethan > driver.
Jason