Re: [PATCH] PCI: Fix link speed calculation on retrain failure

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



[+cc Sathy, Jiwei, Adrian]

On Mon, Jun 23, 2025 at 03:22:14PM +0200, Lukas Wunner wrote:
> When pcie_failed_link_retrain() fails to retrain, it tries to revert to
> the previous link speed.  However it calculates that speed from the Link
> Control 2 register without masking out non-speed bits first.
> 
> PCIE_LNKCTL2_TLS2SPEED() converts such incorrect values to
> PCI_SPEED_UNKNOWN, which in turn causes a WARN splat in
> pcie_set_target_speed():
> 
>   pci 0000:00:01.1: [1022:14ed] type 01 class 0x060400 PCIe Root Port
>   pci 0000:00:01.1: broken device, retraining non-functional downstream link at 2.5GT/s
>   pci 0000:00:01.1: retraining failed
>   WARNING: CPU: 1 PID: 1 at drivers/pci/pcie/bwctrl.c:168 pcie_set_target_speed
>   RDX: 0000000000000001 RSI: 00000000000000ff RDI: ffff9acd82efa000
>   pcie_failed_link_retrain
>   pci_device_add
>   pci_scan_single_device
>   pci_scan_slot
>   pci_scan_child_bus_extend
>   acpi_pci_root_create
>   pci_acpi_scan_root
>   acpi_pci_root_add
>   acpi_bus_attach
>   device_for_each_child
>   acpi_dev_for_each_child
>   acpi_bus_attach
>   device_for_each_child
>   acpi_dev_for_each_child
>   acpi_bus_attach
>   acpi_bus_scan
>   acpi_scan_init
>   acpi_init
> 
> Per the calling convention of the System V AMD64 ABI, the arguments to
> pcie_set_target_speed(struct pci_dev *, enum pci_bus_speed, bool) are
> stored in RDI, RSI, RDX.  As visible above, RSI contains 0xff, i.e.
> PCI_SPEED_UNKNOWN.
> 
> Fixes: f68dea13405c ("PCI: Revert to the original speed after PCIe failed link retraining")
> Reported-by: Andrew <andreasx0@xxxxxxxxxxxxxx>
> Closes: https://lore.kernel.org/r/7iNzXbCGpf8yUMJZBQjLdbjPcXrEJqBxy5-bHfppz0ek-h4_-G93b1KUrm106r2VNF2FV_sSq0nENv4RsRIUGnlYZMlQr2ZD2NyB5sdj5aU=@protonmail.com/
> Signed-off-by: Lukas Wunner <lukas@xxxxxxxxx>
> Cc: stable@xxxxxxxxxxxxxxx # v6.12+

I like the brevity of this patch, but I do worry that if we ever have
other users of PCIE_LNKCTL2_TLS2SPEED(), we might have the same
problem again.

Also, it looks like PCIE_LNKCAP_SLS2SPEED() has the same problem.

f68dea13405c predates PCIE_LNKCTL2_TLS2SPEED(), and I don't think this
problem existed as of f68dea13405c.  I think the Fixes: tag should be
for de9a6c8d5dbf ("PCI/bwctrl: Add pcie_set_target_speed() to set PCIe
Link Speed"), which added PCIE_LNKCTL2_TLS2SPEED() and
PCIE_LNKCAP_SLS2SPEED() without masking out the other bits.

I think I'll take Jiwei's patch [1], which fixes
PCIE_LNKCTL2_TLS2SPEED() and PCIE_LNKCAP_SLS2SPEED() without requiring
changes in the users.  I'll add the details of Andrew's report to the
commit log.

[1] https://lore.kernel.org/all/20250123055155.22648-2-sjiwei@xxxxxxx/

> ---
>  drivers/pci/quirks.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index d7f4ee6..deaaf4f 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -108,7 +108,7 @@ int pcie_failed_link_retrain(struct pci_dev *dev)
>  	pcie_capability_read_word(dev, PCI_EXP_LNKCTL2, &lnkctl2);
>  	pcie_capability_read_word(dev, PCI_EXP_LNKSTA, &lnksta);
>  	if (!(lnksta & PCI_EXP_LNKSTA_DLLLA) && pcie_lbms_seen(dev, lnksta)) {
> -		u16 oldlnkctl2 = lnkctl2;
> +		u16 oldlnkctl2 = lnkctl2 & PCI_EXP_LNKCTL2_TLS;
>  
>  		pci_info(dev, "broken device, retraining non-functional downstream link at 2.5GT/s\n");
>  
> -- 
> 2.47.2
> 




[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux