Re: [PATCH] PCI: Fix link speed calculation on retrain failure

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Tested and passed.
But what i mean is why try to retrain the line of the connected bios disabled discrete GPU? Is the normal of disabled to drop it to 2.5?


On Wednesday, June 25th, 2025 at 20:46, Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote:

> On Wed, Jun 25, 2025 at 04:06:58PM +0000, andreasx0 wrote:
> 

> > Again. As said the patch from Lucas fixed the warning that was
> > caused because the discrete nvidia gpu was disabled by bios.
> 

> 

> The series I applied is at
> https://lore.kernel.org/all/20250123055155.22648-1-sjiwei@xxxxxxx/.
> The patches currently queued are at
> https://git.kernel.org/pub/scm/linux/kernel/git/pci/pci.git/log/?h=enumeration
> 

> I cc'd you on my response to that series, so if you think the commit
> log needs a change, feel free to suggest something in that thread.
> It's a generic problem, not anything specific to the GPU, so I just
> included the log messages a user would see when the problem happens.
> 

> I added your Reported-by because I think the first patch [2] should
> fix the problem you saw. If it doesn't, please let me know. If you
> test it and it does fix the problem, I'd be happy to add your
> Tested-by as well.
> 

> Thanks very much for reporting this issue and giving it a nudge to get
> it fixed!
> 

> [2] https://git.kernel.org/pub/scm/linux/kernel/git/pci/pci.git/commit/?id=9989e0ca7462
> 

> > On Tuesday, June 24th, 2025 at 21:13, Sathyanarayanan Kuppuswamy sathyanarayanan.kuppuswamy@xxxxxxxxxxxxxxx wrote:
> > 

> > > On 6/24/25 9:48 AM, Bjorn Helgaas wrote:
> > 

> > > > [+cc Sathy, Jiwei, Adrian]
> > 

> > > > On Mon, Jun 23, 2025 at 03:22:14PM +0200, Lukas Wunner wrote:
> > 

> > > > > When pcie_failed_link_retrain() fails to retrain, it tries to revert to
> > > > > the previous link speed. However it calculates that speed from the Link
> > > > > Control 2 register without masking out non-speed bits first.
> > 

> > > > > PCIE_LNKCTL2_TLS2SPEED() converts such incorrect values to
> > > > > PCI_SPEED_UNKNOWN, which in turn causes a WARN splat in
> > > > > pcie_set_target_speed():
> > 

> > > > > pci 0000:00:01.1: [1022:14ed] type 01 class 0x060400 PCIe Root Port
> > > > > pci 0000:00:01.1: broken device, retraining non-functional downstream link at 2.5GT/s
> > > > > pci 0000:00:01.1: retraining failed
> > > > > WARNING: CPU: 1 PID: 1 at drivers/pci/pcie/bwctrl.c:168 pcie_set_target_speed
> > > > > RDX: 0000000000000001 RSI: 00000000000000ff RDI: ffff9acd82efa000
> > > > > pcie_failed_link_retrain
> > > > > pci_device_add
> > > > > pci_scan_single_device
> > > > > pci_scan_slot
> > > > > pci_scan_child_bus_extend
> > > > > acpi_pci_root_create
> > > > > pci_acpi_scan_root
> > > > > acpi_pci_root_add
> > > > > acpi_bus_attach
> > > > > device_for_each_child
> > > > > acpi_dev_for_each_child
> > > > > acpi_bus_attach
> > > > > device_for_each_child
> > > > > acpi_dev_for_each_child
> > > > > acpi_bus_attach
> > > > > acpi_bus_scan
> > > > > acpi_scan_init
> > > > > acpi_init
> > 

> > > > > Per the calling convention of the System V AMD64 ABI, the arguments to
> > > > > pcie_set_target_speed(struct pci_dev *, enum pci_bus_speed, bool) are
> > > > > stored in RDI, RSI, RDX. As visible above, RSI contains 0xff, i.e.
> > > > > PCI_SPEED_UNKNOWN.
> > 

> > > > > Fixes: f68dea13405c ("PCI: Revert to the original speed after PCIe failed link retraining")
> > > > > Reported-by: Andrew andreasx0@xxxxxxxxxxxxxx
> > > > > Closes: https://lore.kernel.org/r/7iNzXbCGpf8yUMJZBQjLdbjPcXrEJqBxy5-bHfppz0ek-h4_-G93b1KUrm106r2VNF2FV_sSq0nENv4RsRIUGnlYZMlQr2ZD2NyB5sdj5aU=@protonmail.com/
> > > > > Signed-off-by: Lukas Wunner lukas@xxxxxxxxx
> > > > > Cc: stable@xxxxxxxxxxxxxxx # v6.12+
> > > > > I like the brevity of this patch, but I do worry that if we ever have
> > > > > other users of PCIE_LNKCTL2_TLS2SPEED(), we might have the same
> > > > > problem again.
> > 

> > > > Also, it looks like PCIE_LNKCAP_SLS2SPEED() has the same problem.
> > 

> > > > f68dea13405c predates PCIE_LNKCTL2_TLS2SPEED(), and I don't think this
> > > > problem existed as of f68dea13405c. I think the Fixes: tag should be
> > > > for de9a6c8d5dbf ("PCI/bwctrl: Add pcie_set_target_speed() to set PCIe
> > > > Link Speed"), which added PCIE_LNKCTL2_TLS2SPEED() and
> > > > PCIE_LNKCAP_SLS2SPEED() without masking out the other bits.
> > 

> > > > I think I'll take Jiwei's patch [1], which fixes
> > > > PCIE_LNKCTL2_TLS2SPEED() and PCIE_LNKCAP_SLS2SPEED() without requiring
> > > > changes in the users. I'll add the details of Andrew's report to the
> > > > commit log.
> > 

> > > Agree. It is better to fix it in the macro.
> > 

> > > > [1] https://lore.kernel.org/all/20250123055155.22648-2-sjiwei@xxxxxxx/
> > 

> > > > > ---
> > > > > drivers/pci/quirks.c | 2 +-
> > > > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > 

> > > > > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> > > > > index d7f4ee6..deaaf4f 100644
> > > > > --- a/drivers/pci/quirks.c
> > > > > +++ b/drivers/pci/quirks.c
> > > > > @@ -108,7 +108,7 @@ int pcie_failed_link_retrain(struct pci_dev *dev)
> > > > > pcie_capability_read_word(dev, PCI_EXP_LNKCTL2, &lnkctl2);
> > > > > pcie_capability_read_word(dev, PCI_EXP_LNKSTA, &lnksta);
> > > > > if (!(lnksta & PCI_EXP_LNKSTA_DLLLA) && pcie_lbms_seen(dev, lnksta)) {
> > > > > - u16 oldlnkctl2 = lnkctl2;
> > > > > + u16 oldlnkctl2 = lnkctl2 & PCI_EXP_LNKCTL2_TLS;
> > 

> > > > > pci_info(dev, "broken device, retraining non-functional downstream link at 2.5GT/s\n");
> > 

> > > > > --
> > > > > 2.47.2
> > 

> > > --
> > > Sathyanarayanan Kuppuswamy
> > > Linux Kernel Developer
> 

> 

> 

> 

Attachment: publickey - andreasx0@protonmail.com - 0xF61BB148.asc
Description: application/pgp-keys

Attachment: signature.asc
Description: OpenPGP digital signature


[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux