[PATCH v2 0/1] PCI: pcie_failed_link_retrain() return if dev is not ASM2824

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Sorry for delayed response here.

On Fri, 1 Aug 2025, Maciej W. Rozycki wrote:
> CESta: RxErr- BadTLP+ BadDLLP+ Rollover- Timeout- AdvNonFatalErr-

The information you sent is somewhat incomplete. I guess you probably won't be
able to get any of the LTSSM state information unless one of the devices has an
ltssm log you can dump, but I doubt either of them do.

When I see that BadTLP and BadDLLP are still set it makes me suspect that
the hierarchy isn't configured correctly in order for those errors to go
to the root port. Or perhaps they're just being reported to the BIOS &
ignored or not cleared.

> but how would I gather such error information?

Lets try to figure out what is in control of AER & how/whether the hierarchy
is configured to send errors all the way to the root port. First we have to look
around "OSC" related kernel logging & the adjacent root port. Example here from an
Intel system we can see OS took control over AER (and other things) from BIOS. We
can infer this was for Bus 4f root port since its logged just after afaik. The
negotiation happens on a per root port basis so need to make sure its the root
port in hierarchy of the devices we're interested in. I've seen some BIOS retain
AER control over PCIe ports on the PCH.

Example dmesg from during boot:
acpi PNP0A08:04: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI HPX-Type3]
acpi PNP0A08:04: _OSC: OS now controls [PCIeHotplug PME AER PCIeCapability LTR]
acpi PNP0A08:04: FADT indicates ASPM is unsupported, using BIOS configuration
PCI host bridge to bus 0000:4f

We would want to look at lspci for the root port, the asmedia USP, the asmedia
DSP and the USP of the pericom switch (when able). I don't have any nested
switch configurations, but I think I can generalize it a little. Maybe this is
a correct configuration (using BDFs from a system I have to start with).

 +-[0000:4f]-+-00.0 Intel Corporation ...
 |           +-...
 |           +-01.0-[50-57]--+-00.0-[51-57]--+-00.0-[52-53] 

RP: 4f:01.0
USP (asmedia): 50:00.0
DSP (asmedia): 51:00.0
USP (pericom): 52:00.0

Root port can tell us if PCIe errors are going to the BIOS. IF any of the
ErrCorrectable, ErrNon-Fatal, ErrFatal, are set in the RootCtrl then those
error types would most likely go to the BIOS even if the OS thinks it took
control. Someone will have to correct me if wrong about ARM. If you sent
the full lspci -vvv of root port, USP/DSP/USP combo I could figure out
whats going on.

lspci -vvv -s 4f:01.0

4f:01.0 PCI bridge: Intel Corporation Device 352a (rev 04) (prog-if 00 [Normal decode])
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        ...
        ...
        Capabilities: [40] Express (v2) Root Port (Slot+), MSI 00
                DevCtl: CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+
                DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
                LnkCap: Port #25, Speed 32GT/s, Width x8, ASPM not supported
                        ClockPM- Surprise+ LLActRep+ BwNot+ ASPMOptComp+
                LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 16GT/s, Width x8
                        TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt-
                SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise-
                        Slot #0, PowerLimit 75W; Interlock- NoCompl-
                RootCap: CRSVisible+
                RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna+ CRSVisible+





[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux