On Fri, Jun 13, 2025 at 06:08:43PM -0400, Jim Quinlan wrote: > Whereas most PCIe HW returns 0xffffffff on illegal accesses and the like, > by default Broadcom's STB PCIe controller effects an abort. Some SoCs -- > 7216 and its descendants -- have new HW that identifies error details. What's the long term plan for this? This abort is a huge problem that we're seeing across arm64 platforms. Forcing a panic and reboot for every uncorrectable error is pretty hard to deal with. Is there a plan to someday recover from these aborts? Or change the hardware so it can at least be configured to return ~0 data after logging the error in the hardware registers? > This simple handler determines if the PCIe controller was the cause of the > abort and if so, prints out diagnostic info. Unfortunately, an abort still > occurs. > > Care is taken to read the error registers only when the PCIe bridge is > active and the PCIe registers are acceptable. Otherwise, a "die" event > caused by something other than the PCIe could cause an abort if the PCIe > "die" handler tried to access registers when the bridge is off. Checking whether the bridge is active is a "mostly-works" situation since it's always racy. > Example error output: > brcm-pcie 8b20000.pcie: Error: Mem Acc: 32bit, Read, @0x38000000 > brcm-pcie 8b20000.pcie: Type: TO=0 Abt=0 UnspReq=1 AccDsble=0 BadAddr=0