On 8/6/25 11:50, Bjorn Helgaas wrote:
I'm not sure I understand the "racy" comment. If the PCIe bridge is
off, we do not read the PCIe error registers. In this case, PCIe is
probably not the cause of the panic. In the rare case the PCIe
bridge is off and it was the PCIe that caused the panic, nothing
gets reported, and this is where we are without this commit.
Perhaps this is what you mean by "mostly-works". But this is the
best that can be done with SW given our HW.
Right, my fault. The error report registers don't look like standard
PCIe things, so I suppose they are on the host side, not the PCIe
side, so they're probably guaranteed to be accessible and non-racy
unless the bridge is in reset.
To expand upon that part, the situation that I ran in we had the PCIe
link down and therefore clock gated the PCIe root complex hardware to
conserve power. Eventually I did hit a voluntary panic, and since all
panic notifiers registered are invoked in succession, the one registered
for the PCIe RC was invoked as well and accessing clock gated registers
would not work and trigger another fault which would be confusing and
mingle with the panic I was trying to debug initially. Hence this check,
and a clock gated PCIe RC would not be logging any errors anyway.
--
Florian