Re: [PATCH] PCI: Disable RRS polling for Intel SSDPE2KX020T8 nvme

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Aug 08, 2025 at 10:23:45AM +0800, Hui Wang wrote:
> Hi Bjorn,
> 
> Any progress on this issue, do we have a fix for this now? The
> ubuntu users are waiting for a fix :-).

Not yet, but thanks for the reminder.  Keep bugging me!

PCIe r7.0, sec 2.3.1, makes it clear that devices are permitted to
return RRS after FLR:

  ◦ For Configuration Requests only, if Device Readiness Status is not
    supported, following reset it is permitted for a Function to
    terminate the request and indicate that it is temporarily unable
    to process the Request, but will be able to process the Request in
    the future - in this case, the Request Retry Status (RRS)
    Completion Status must be used (see § Section 6.6). Valid reset
    conditions after which a device/Function is permitted to return
    RRS in response to a Configuration Request are:

    ▪ FLRs

    ...

But I am a little bit concerned because sec 2.3.2, which talks about
how a Root Complex handles that RRS and the RRS Software Visiblity
feature, says (note the "system reset" period):

  Root Complex handling of a Completion with Request Retry Status for
  a Configuration Request is implementation specific, except for the
  period following SYSTEM RESET (see § Section 6.6). For Root
  Complexes that support Configuration RRS Software Visibility, the
  following rules apply:

    ◦ If Configuration RRS Software Visibility is enabled:

      ▪ For a Configuration Read Request that includes both bytes of
	the Vendor ID field of a device Function's Configuration Space
	Header, the Root Complex must complete the Request to the host
	by returning a read-data value of 0001h for the Vendor ID
	field and all 1's for any additional bytes included in the
	request.

So I'm worried that the Software Visibility feature might work after
*system reset*, but not necessarily after an FLR.  That might make
sense because I don't think the RC can tell when we are doing an FLR
to a device.

It seems that after FLR, most RCs *do* make RRS visible via SV.  But
if we can't rely on that, I don't know how we're supposed to learn
when a device becomes ready.

Bjorn

> On 7/3/25 08:05, Hui Wang wrote:
> > On 7/2/25 17:43, Hui Wang wrote:
> > > On 7/2/25 07:23, Bjorn Helgaas wrote:
> > > > On Tue, Jun 24, 2025 at 08:58:57AM +0800, Hui Wang wrote:
> > > > > Sorry for late response, I was OOO the past week.
> > > > > 
> > > > > This is the log after applied your patch:
> > > > > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2111521/comments/61
> > > > > 
> > > > > Looks like the "retry" makes the nvme work.
> > > >
> > > > Thank you!  It seems like we get 0xffffffff (probably PCIe
> > > > error) for a long time after we think the device should be
> > > > able to respond with RRS.
> > > > 
> > > > I always thought the spec required that after the delays, a
> > > > device should respond with RRS if it's not ready, but now I
> > > > guess I'm not 100% sure.  Maybe it's allowed to just do
> > > > nothing, which would lead to the Root Port timing out and
> > > > logging an Unsupported Request error.
> > > > 
> > > > Can I trouble you to try the patch below?  I think we might
> > > > have to start explicitly checking for that error.  That
> > > > probably would require some setup to enable the error, check
> > > > for it, and clear it.  I hacked in some of that here, but
> > > > ultimately some of it should go elsewhere.
> ...




[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux