Re: [PATCH v2 1/4] PCI: dw-rockchip: Do not enumerate bus before endpoint devices are ready

Bjorn Helgaas <helgaas@xxxxxxxxxx> · Tue, 3 Jun 2025 13:12:50 -0500

On Tue, Jun 03, 2025 at 04:08:15PM +0200, Niklas Cassel wrote:
> On Sat, May 31, 2025 at 12:17:43PM +0530, Manivannan Sadhasivam wrote:
> > On Fri, May 30, 2025 at 02:43:47PM -0500, Bjorn Helgaas wrote:
> > > On Fri, May 30, 2025 at 07:24:53PM +0200, Niklas Cassel wrote:
> > > > On 30 May 2025 19:19:37 CEST, Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote:
> > > > >
> > > > >I think all drivers should wait PCIE_T_RRS_READY_MS (100ms) after exit
> > > > >from Conventional Reset (if port only supports <= 5.0 GT/s) or after
> > > > >link training completes (if port supports > 5.0 GT/s).
> > > > >
> > > > >> So I don't think this is a device specific issue but rather
> > > > >> controller specific.  And this makes the Qcom patch that I dropped a
> > > > >> valid one (ofc with change in description).
> > > > >
> > > > >URL?
> > > > 
> > > > PATCH 4/4 of this series.
> > > 
> > > If you mean
> > > https://lore.kernel.org/r/20250506073934.433176-10-cassel@xxxxxxxxxx,
> > > that patch merely replaces "100" with PCIE_T_PVPERL_MS, which doesn't
> > > fix anything and is valid regardless of this Plextor-related patch
> > > ("PCI: dw-rockchip: Do not enumerate bus before endpoint devices are
> > > ready").
> > 
> > It is patch 2/4:
> > https://lore.kernel.org/all/20250506073934.433176-8-cassel@xxxxxxxxxx
> 
> Hello all,
> 
> I'm getting some mixed messages here.
> 
> If I understand Bjorn correctly, he would prefer a NVMe quirk, and looking
> at pci/next, PATCH 1/4 has been dropped.

Hmmm, sorry, I misinterpreted both 1/4 and 2/4.  I read them as "add
this delay so the PLEXTOR device works", but in fact, I think in both
cases, the delay is actually to enforce the PCIe r6.0, sec 6.6.1,
requirement for software to wait 100ms before issuing a config
request, and the fact that it makes PLEXTOR work is a side effect of
that.

The beginning of that 100ms delay is "exit from Conventional Reset"
(ports that support <= 5.0 GT/s) or "link training completes" (ports
that support > 5.0 GT/s).

I think we lack that 100ms delay in dwc drivers in general.  The only
generic dwc delay is in dw_pcie_host_init() via the LINK_WAIT_SLEEP_MS
in dw_pcie_wait_for_link(), but that doesn't count because it's
*before* the link comes up.  We have to wait 100ms *after* exiting
Conventional Reset or completing link training.  

We don't know when the exit from Conventional Reset was, but it was
certainly before the link came up.  In the absence of a timestamp for
exit from reset, starting the wait after link-up is probably the best
we can do.  This could be either after dw_pcie_wait_for_link() finds
the link up or when we handle the link-up interrupt.

Patches 1 and 2 would fix the link-up interrupt case.  I think we need
another patch for the dwc core for dw_pcie_wait_for_link().

I wish I'd had time to spend on this and include patches 1 and 2, but
we're up against the merge window wire and I'll be out the end of this
week, so I think they'll have to wait.  It seems like something we can
still justify for v6.16 though.

This also means I don't think we should need an NVMe quirk.

> If I understand Mani correctly, he thinks that we should queue up PATCH 1/4
> and PATCH 2/4 (although with modified commit messages).
>
> As you know, I do not have the (problematic) Plextor drive, so we go with
> the quirk option, then we would need to ask Laszlo nicely to retest.
> (And to provide the PCI device and PCI vendor ID of his NVMe device so we
> can write a quirk.)