On Fri, Jul 11, 2025 at 11:20:15AM +0800, Hongbo Yao wrote: > 2025/7/8 1:04, Sathyanarayanan Kuppuswamy: > > On 7/7/25 3:30 AM, Andy Xu wrote: > > > Setting timeout to 7s covers both devices with safety margin. > > > > Instead of updating the recovery time, can you check why your device > > recovery takes > > such a long time and how to fix it from the device end? > > I fully agree that ideally the root cause should be addressed on the > device side to reduce the DPC recovery latency, and that waiting longer > in the kernel is not a perfect solution. > > However, the current 4 seconds timeout in pci_dpc_recovered() is indeed > an empirical value rather than a hard requirement from the PCIe > specification. In real-world scenarios, like with Mellanox ConnectX-5/7 > adapters, we've observed that full DPC recovery can take more than 5-6 > seconds, which leads to premature hotplug processing and device removal. I think Sathya's point was: Have you made an effort to talk to the vendor and ask them to root-cause and fix the issue e.g. with a firmware update. > To improve robustness and maintain flexibility, I???m considering > introducing a module parameter to allow tuning the DPC recovery timeout > dynamically. Would you like me to prepare and submit such a patch for > review? We try to avoid adding new module parameters. Things should just work out of the box without the user having to adjust the kernel command line for their system. So the solution is indeed to either adjust the delay for everyone (as you've done) or introduce an unsigned int to struct pci_dev which can be assigned the delay after reset for the device to be responsive. For comparison, we're allowing up to 60 sec for devices to become available after a Fundamental Reset or Conventional Reset (PCIE_RESET_READY_POLL_MS). That's how long we're waiting in dpc_reset_link() -> pci_bridge_wait_for_secondary_bus() and we're not consistent with that when we wait only 4 sec in pci_dpc_recovered(). I think the reason is that we weren't really sure whether this approach to synchronize hotplug with DPC works well and how to choose delays. But we've had this for a few years now and it seems to have worked nicely for people. I think this is the first report where it's not been working out of the box. Thanks, Lukas