Re: [PATCH v3 5/5] pci: Suspend iommu function prior to resetting a device

Nicolin Chen <nicolinc@xxxxxxxxxx> · Fri, 22 Aug 2025 11:50:58 -0700

On Fri, Aug 22, 2025 at 11:08:21AM -0300, Jason Gunthorpe wrote:
> On Thu, Aug 21, 2025 at 11:35:27PM -0700, Nicolin Chen wrote:
> > On Thu, Aug 21, 2025 at 10:07:41AM -0300, Jason Gunthorpe wrote:
> > > On Tue, Aug 19, 2025 at 02:59:07PM -0700, Nicolin Chen wrote:
> > > >  c) multiple pci_devs with their own RIDs
> > > > 
> > > >     In this case, either FLR or IOMMU only resets the PF. That
> > > >     being said, VFs might be affected since PF is resetting?
> > > >     If there is an issue, I don't see it coming from the IOMMU-
> > > >     level reset..
> > > 
> > > It would still allow the ATS issue from the VF side. The VF could be
> > > pushing an invalidation during the PF reset that will get clobbered.
> > > 
> > > I haven't fully checked but I think Linux doesn't really (easially?)
> > > allow resetting a PF while a VF is present...
> > 
> > Hmm, what if the PF encountered some fault? Does Linux have a choice
> > not to reset PF?
> 
> Upon more reflect I guess outside VFIO I seem to remember the SRIOV
> reset to the PFs will clobber the VFs too and then restore the SRIOV
> configuration in config space to bring them back.

Yea, I see ci_restore_iov_state() called in pci_restore_state().

> > > Arguably if the PF is reset the VFs should have their translations
> > > blocked too.
> > 
> > Yea, that sounds plausible to me. But, prior to that (an IOMMU-level
> > reset), should VFs be first reset at the PCI level?
> 
> PF reset of a SRIOV PF disables the VFs and effectively resets them
> already.

Yea, I was expecting something like a SW reset routine, for each VF,
which would invoke iommu_dev_reset_prepare/_done() individually.

Without that, iommu_dev_reset_prepare/_done() has to iterate all VFs
internally and block them.

> But reaching out to mangle the translation of the VFs means you do
> have to take care not to disrupt anything else the VF owning driver is
> doing since it is fully unaware of this. Ie it could be reattaching to
> something else concurrently.

Hmm, and this is tricky now..

The current version allows deferring the concurrent attach during a
reset. But, as Kevin pointed out, we may have no choice but to fail
any concurrent attach with -EBUSY, because a deferred attach might
fail due to incompatibility triggering a WARN_ON only in done().

This isn't likely a problem for PF, as we can expect its driver not
to do an insane concurrent attach during a reset. But it would be a
very sane case for a VF. So if its driver doesn't retry or defer an
EBUSY-ed attach properly, it would not be restored successfully..

It feels like we need a no-fail re-attach operation, or at least an
unlikely-to-fail one. I recall years ago we tried a can_attach op
to test the compatibility but it didn't get merged. Maybe we'd need
it so that a concurrent attach can test compatibility, allowing the
re-attach in iommu_dev_reset_done() to more likely succeed.

Thanks
Nicolin