On 8/7/2025 5:52 AM, Keith Busch wrote:
On Wed, Aug 06, 2025 at 04:34:09PM -0500, Bjorn Helgaas wrote:
However, the current 4 seconds timeout in pci_dpc_recovered() is indeed
an empirical value rather than a hard requirement from the PCIe
specification. In real-world scenarios, like with Mellanox ConnectX-5/7
adapters, we've observed that full DPC recovery can take more than 5-6
seconds, which leads to premature hotplug processing and device removal.
I think Sathya's point was: Have you made an effort to talk to the
vendor and ask them to root-cause and fix the issue e.g. with a firmware
update.
Would definitely be great, but unless we have a number in the spec to
point to, they might just shrug and ask what the requirement is.
I agree, and I have similar problems with other arbitrary kernel timing
decicsions. Specifically RRL where there's no spec defined number yet my
patch to modify it has not received much consideration.
https://lore.kernel.org/linux-pci/20250218165444.2406119-1-kbusch@xxxxxxxx/
At least, with this patch, have a workaround in hand to make some device
work.
Thanks,
Ethan