Re: [PATCH] PCI: Disable RRS polling for Intel SSDPE2KX020T8 nvme

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Bjorn,

On Mon, Aug 11, 2025 at 06:04:45PM -0500, Bjorn Helgaas wrote:
> On Fri, Aug 08, 2025 at 10:23:45AM +0800, Hui Wang wrote:
> > Hi Bjorn,
> > 
> > Any progress on this issue, do we have a fix for this now? The
> > ubuntu users are waiting for a fix :-).
> 
> Not yet, but thanks for the reminder.  Keep bugging me!

Other distributions' users waiting for the fix too!

Thanks,

> 
> PCIe r7.0, sec 2.3.1, makes it clear that devices are permitted to
> return RRS after FLR:
> 
>   ◦ For Configuration Requests only, if Device Readiness Status is not
>     supported, following reset it is permitted for a Function to
>     terminate the request and indicate that it is temporarily unable
>     to process the Request, but will be able to process the Request in
>     the future - in this case, the Request Retry Status (RRS)
>     Completion Status must be used (see § Section 6.6). Valid reset
>     conditions after which a device/Function is permitted to return
>     RRS in response to a Configuration Request are:
> 
>     ▪ FLRs
> 
>     ...
> 
> But I am a little bit concerned because sec 2.3.2, which talks about
> how a Root Complex handles that RRS and the RRS Software Visiblity
> feature, says (note the "system reset" period):
> 
>   Root Complex handling of a Completion with Request Retry Status for
>   a Configuration Request is implementation specific, except for the
>   period following SYSTEM RESET (see § Section 6.6). For Root
>   Complexes that support Configuration RRS Software Visibility, the
>   following rules apply:
> 
>     ◦ If Configuration RRS Software Visibility is enabled:
> 
>       ▪ For a Configuration Read Request that includes both bytes of
> 	the Vendor ID field of a device Function's Configuration Space
> 	Header, the Root Complex must complete the Request to the host
> 	by returning a read-data value of 0001h for the Vendor ID
> 	field and all 1's for any additional bytes included in the
> 	request.
> 
> So I'm worried that the Software Visibility feature might work after
> *system reset*, but not necessarily after an FLR.  That might make
> sense because I don't think the RC can tell when we are doing an FLR
> to a device.
> 
> It seems that after FLR, most RCs *do* make RRS visible via SV.  But
> if we can't rely on that, I don't know how we're supposed to learn
> when a device becomes ready.
> 
> Bjorn
> 
> > On 7/3/25 08:05, Hui Wang wrote:
> > > On 7/2/25 17:43, Hui Wang wrote:
> > > > On 7/2/25 07:23, Bjorn Helgaas wrote:
> > > > > On Tue, Jun 24, 2025 at 08:58:57AM +0800, Hui Wang wrote:
> > > > > > Sorry for late response, I was OOO the past week.
> > > > > > 
> > > > > > This is the log after applied your patch:
> > > > > > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2111521/comments/61
> > > > > > 
> > > > > > Looks like the "retry" makes the nvme work.
> > > > >
> > > > > Thank you!  It seems like we get 0xffffffff (probably PCIe
> > > > > error) for a long time after we think the device should be
> > > > > able to respond with RRS.
> > > > > 
> > > > > I always thought the spec required that after the delays, a
> > > > > device should respond with RRS if it's not ready, but now I
> > > > > guess I'm not 100% sure.  Maybe it's allowed to just do
> > > > > nothing, which would lead to the Root Port timing out and
> > > > > logging an Unsupported Request error.
> > > > > 
> > > > > Can I trouble you to try the patch below?  I think we might
> > > > > have to start explicitly checking for that error.  That
> > > > > probably would require some setup to enable the error, check
> > > > > for it, and clear it.  I hacked in some of that here, but
> > > > > ultimately some of it should go elsewhere.
> > ...




[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux