On Wed, Sep 03, 2025 at 10:21:34AM +0200, Lukas Wunner wrote: > On Tue, Sep 02, 2025 at 11:59:11AM -0600, Keith Busch wrote: > > Hm, I think you're right. We are definitely seeing pciehp requeue itself > > with the link/presence events that we want to be ignored, so we're > > getting re-enumeration when we didn't expect it. I thought the > > back-to-back resets that we're causing vfio to initiate was the problem, > > but maybe not. I think the switch and/or end device we're using have > > some unusual link timings that defeats the pciehp ignore logic. > > pci_bridge_secondary_bus_reset() calls pci_bridge_wait_for_secondary_bus() > to await Link Up. So unless the link flaps afterwards, this should be > fine. > > Another possibility is that the pciehp_device_replaced() check triggers, > e.g. because the Endpoint's Device Serial Number or other data in Config > Space changed after the second reset. That can happen because we're using switches that insert a fake "placeholder" device when a link is down. > Maybe you can instrument the code with a few printk()'s to see what's > going on. But it looks like we're more frequently seeing the link not active. Here's the existing messages printed: [ 7904.749658] vfio-pci 0000:05:00.0: disabling bus mastering [ 7904.756595] vfio-pci 0000:05:00.0: reset via bus [ 7904.759975] pcieport 0000:02:02.0: waiting 100 ms for downstream link, after activation [ 7905.908987] vfio-pci 0000:05:00.0: ready 0ms after bus reset [ 7905.909003] pcieport 0000:02:02.0: pciehp: Slot(314): Link Down/Up ignored [ 7906.847973] vfio-pci 0000:05:00.0: resetting [ 7906.856312] vfio-pci 0000:05:00.0: reset via bus [ 7906.862967] pcieport 0000:02:02.0: waiting 100 ms for downstream link, after activation [ 7909.915925] pcieport 0000:02:02.0: Data Link Layer Link Active not set in 100 msec [ 7909.915953] pcieport 0000:02:02.0: pciehp: Slot(314): Link Down/Up ignored [ 7909.915977] pcieport 0000:02:02.0: pciehp: Slot(314): Link Down [ 7909.915978] pcieport 0000:02:02.0: pciehp: Slot(314): Card not present [ 7909.918934] pcieport 0000:02:02.0: waiting 100 ms for downstream link, after activation [ 7911.923899] vfio-pci 0000:05:00.0: disconnected; not waiting [ 7911.923905] vfio-pci 0000:05:00.0: bus failed with -25