Sorry for delayed response here. On Fri, 1 Aug 2025, Maciej W. Rozycki wrote: > CESta: RxErr- BadTLP+ BadDLLP+ Rollover- Timeout- AdvNonFatalErr- The information you sent is somewhat incomplete. I guess you probably won't be able to get any of the LTSSM state information unless one of the devices has an ltssm log you can dump, but I doubt either of them do. When I see that BadTLP and BadDLLP are still set it makes me suspect that the hierarchy isn't configured correctly in order for those errors to go to the root port. Or perhaps they're just being reported to the BIOS & ignored or not cleared. > but how would I gather such error information? Lets try to figure out what is in control of AER & how/whether the hierarchy is configured to send errors all the way to the root port. First we have to look around "OSC" related kernel logging & the adjacent root port. Example here from an Intel system we can see OS took control over AER (and other things) from BIOS. We can infer this was for Bus 4f root port since its logged just after afaik. The negotiation happens on a per root port basis so need to make sure its the root port in hierarchy of the devices we're interested in. I've seen some BIOS retain AER control over PCIe ports on the PCH. Example dmesg from during boot: acpi PNP0A08:04: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI HPX-Type3] acpi PNP0A08:04: _OSC: OS now controls [PCIeHotplug PME AER PCIeCapability LTR] acpi PNP0A08:04: FADT indicates ASPM is unsupported, using BIOS configuration PCI host bridge to bus 0000:4f We would want to look at lspci for the root port, the asmedia USP, the asmedia DSP and the USP of the pericom switch (when able). I don't have any nested switch configurations, but I think I can generalize it a little. Maybe this is a correct configuration (using BDFs from a system I have to start with). +-[0000:4f]-+-00.0 Intel Corporation ... | +-... | +-01.0-[50-57]--+-00.0-[51-57]--+-00.0-[52-53] RP: 4f:01.0 USP (asmedia): 50:00.0 DSP (asmedia): 51:00.0 USP (pericom): 52:00.0 Root port can tell us if PCIe errors are going to the BIOS. IF any of the ErrCorrectable, ErrNon-Fatal, ErrFatal, are set in the RootCtrl then those error types would most likely go to the BIOS even if the OS thinks it took control. Someone will have to correct me if wrong about ARM. If you sent the full lspci -vvv of root port, USP/DSP/USP combo I could figure out whats going on. lspci -vvv -s 4f:01.0 4f:01.0 PCI bridge: Intel Corporation Device 352a (rev 04) (prog-if 00 [Normal decode]) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ ... ... Capabilities: [40] Express (v2) Root Port (Slot+), MSI 00 DevCtl: CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+ DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend- LnkCap: Port #25, Speed 32GT/s, Width x8, ASPM not supported ClockPM- Surprise+ LLActRep+ BwNot+ ASPMOptComp+ LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+ ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 16GT/s, Width x8 TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt- SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise- Slot #0, PowerLimit 75W; Interlock- NoCompl- RootCap: CRSVisible+ RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna+ CRSVisible+