On Sun, Aug 31, 2025 at 03:43:35PM +0200, Lukas Wunner wrote: > On Wed, Aug 27, 2025 at 03:45:14PM -0700, Keith Busch wrote: > > Synchronize the interrupt to ensure the reset isn't going to disrupt a > > previously pending handler from igoring the reset's link flap. Back to > > back secondary bus resets create a window when the previous reset > > proceeds with DLLLA, waking the pending pciehp interrupt thread, but the > > subsequent reset tears it down while the irq thread tries to confirm the > > link is active, triggering unexpected re-enumeration. > > Help me understand this: > > I think what you mean is that pciehp_reset_slot() runs and the > Secondary Bus Reset causes a spurious link change. So pciehp_ist() runs, > waits for the reset to finish with pci_hp_spurious_link_change(), > then calls pciehp_ignore_link_change(), which tests whether the link > is active again by calling pciehp_check_link_active(). > > And you're saying that at the same time, pciehp_reset_slot() runs, > performs a Secondary Bus Reset, thus brings down the link, > confusing the concurrent pciehp_check_link_active(). > Did I understand that correctly? > > I don't quite see how this can happen, given pciehp_reset_slot() > acquires ctrl->reset_lock for writing and the same lock is held > for reading for the call to pciehp_check_link_active(). > > Moreover pciehp_ist() ignores the link flap in the first iteration > (it clears the flags in the local variable "events") and if > pciehp_check_link_active() would indeed fail, then the bits would > only be set in ctrl->pending_events and a rerun of pciehp_ist() > would be triggered. That second iteration of pciehp_ist() would > then find the PCI_LINK_CHANGED flag set and ignore the link change > (again). > > So this should all work fine. Hm, I think you're right. We are definitely seeing pciehp requeue itself with the link/presence events that we want to be ignored, so we're getting re-enumeration when we didn't expect it. I thought the back-to-back resets that we're causing vfio to initiate was the problem, but maybe not. I think the switch and/or end device we're using have some unusual link timings that defeats the pciehp ignore logic.