On Fri, Jun 13, 2025 at 02:27:09PM -0500, Bjorn Helgaas wrote: > On Tue, Jun 10, 2025 at 01:30:45PM -0300, Jason Gunthorpe wrote: > > On Tue, Jun 10, 2025 at 04:37:58PM +0100, Robin Murphy wrote: > > > On 2025-06-09 7:45 pm, Nicolin Chen wrote: > > > > Hi all, > > > > > > > > Per PCIe r6.3, sec 10.3.1 IMPLEMENTATION NOTE, software should disable ATS > > > > before initiating a Function Level Reset, and then ensure no invalidation > > > > requests being issued to a device when its ATS capability is disabled. > > > > > > Not really - what it says is that software should not expect to receive > > > invalidate completions from a function which is in the process of being > > > reset or powered off, and if software doesn't want to be confused by that > > > then it should take care to wait for completion or timeout of all > > > outstanding requests, and avoid issuing new requests, before initiating such > > > a reset or power transition. > > > > The commit message can be more precise, but I agree with the > > conclusion that the right direction for Linux is to disable and block > > ATS, instead of trying to ignore completion time out events, or trying > > to block page table mutations. Ie do what the implementation note > > says.. > > > > Maybe: > > > > PCIe permits a device to ignore ATS invalidation TLPs while it is > > processing FLR. This creates a problem visible to the OS where ATS > > invalidation commands will time out. For instance a SVA domain will > > have no coordination with a FLR event and can racily issue ATC > > invalidations into a resetting device. > > The sec 10.3.1 implementation note mentions FLR specifically, but it > seems like *any* kind of reset would be vulnerable, e.g., SBR, > external PERST# assert, etc? Yes. I forgot to put a question mark in the cover-letter, asking whether other reset routines would or not need the same trick. So, let's apply this to all the pci_reset_fn_methods.reset_fns? Thanks Nicolin