On Wed, Aug 6, 2025 at 2:15 PM Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote: > > On Fri, Jun 13, 2025 at 06:08:43PM -0400, Jim Quinlan wrote: > > Whereas most PCIe HW returns 0xffffffff on illegal accesses and the like, > > by default Broadcom's STB PCIe controller effects an abort. Some SoCs -- > > 7216 and its descendants -- have new HW that identifies error details. > > What's the long term plan for this? This abort is a huge problem that > we're seeing across arm64 platforms. Forcing a panic and reboot for > every uncorrectable error is pretty hard to deal with. Hello Bjorn, Are you referring to STB/CM systems, Rpi, or something else altogether? > > Is there a plan to someday recover from these aborts? Or change the > hardware so it can at least be configured to return ~0 data after > logging the error in the hardware registers? Some of our upcoming chips will have the ability to do nothing on errant PCIe writes and return 0xffffffff on errant PCIe reads. But none of our STB/CM chips do this currently. I've been asking for this behavior for years but I have limited influence on what happens in HW. > > > > This simple handler determines if the PCIe controller was the cause of the > > abort and if so, prints out diagnostic info. Unfortunately, an abort still > > occurs. > > > > Care is taken to read the error registers only when the PCIe bridge is > > active and the PCIe registers are acceptable. Otherwise, a "die" event > > caused by something other than the PCIe could cause an abort if the PCIe > > "die" handler tried to access registers when the bridge is off. > > Checking whether the bridge is active is a "mostly-works" situation > since it's always racy. I'm not sure I understand the "racy" comment. If the PCIe bridge is off, we do not read the PCIe error registers. In this case, PCIe is probably not the cause of the panic. In the rare case the PCIe bridge is off and it was the PCIe that caused the panic, nothing gets reported, and this is where we are without this commit. Perhaps this is what you mean by "mostly-works". But this is the best that can be done with SW given our HW. Regards, Jim Quinlan Broadcom STB/CM > > > > Example error output: > > brcm-pcie 8b20000.pcie: Error: Mem Acc: 32bit, Read, @0x38000000 > > brcm-pcie 8b20000.pcie: Type: TO=0 Abt=0 UnspReq=1 AccDsble=0 BadAddr=0
Attachment:
smime.p7s
Description: S/MIME Cryptographic Signature