Re: [PATCH] PCI: Disable RRS polling for Intel SSDPE2KX020T8 nvme

Hui Wang <hui.wang@xxxxxxxxxxxxx> · Mon, 16 Jun 2025 19:55:03 +0800

On 6/13/25 00:48, Bjorn Helgaas wrote:
[+cc VMD folks]

On Wed, Jun 11, 2025 at 06:14:42PM +0800, Hui Wang wrote:
Prior to commit d591f6804e7e ("PCI: Wait for device readiness with
Configuration RRS"), this Intel nvme [8086:0a54] works well. Since
that patch is merged to the kernel, this nvme stops working.

Through debugging, we found that commit introduces the RRS polling in
the pci_dev_wait(), for this nvme, when polling the PCI_VENDOR_ID, it
will return ~0 if the config access is not ready yet, but the polling
expects a return value of 0x0001 or a valid vendor_id, so the RRS
polling doesn't work for this nvme.
Sorry for breaking this, and thanks for all your work in debugging
this!  Issues like this are really hard to track down.

I would think we would have heard about this earlier if the NVMe
device were broken on all systems.  Maybe there's some connection with
VMD?  From the non-working dmesg log in your bug report
(https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2111521/+attachment/5879970/+files/dmesg-60.txt):

   DMI: ASUSTeK COMPUTER INC. ESC8000 G4/Z11PG-D24 Series, BIOS 5501 04/17/2019
   vmd 0000:d7:05.5: PCI host bridge to bus 10000:00
   pci 10000:00:02.0: [8086:2032] type 01 class 0x060400 PCIe Root Port
   pci 10000:00:02.0: PCI bridge to [bus 01]
   pci 10000:00:02.0: bridge window [mem 0xf8000000-0xf81fffff]: assigned
   pci 10000:01:00.0: [8086:0a54] type 00 class 0x010802 PCIe Endpoint
   pci 10000:01:00.0: BAR 0 [mem 0x00000000-0x00003fff 64bit]

   <I think vmd_enable_domain() calls pci_reset_bus() here>

Yes, and the pci_dev_wait() is called here. With the RRS polling, will 
get a ~0 from PCI_VENDOR_ID, then will get 0xfffffff when configuring 
the BAR0 subsequently. With the original polling method, it will get 
enough delay in the pci_dev_wait(), so nvme works normally.

The line "[   10.193589] hhhhhhhhhhhhhhhhhhhhhhhhhhhh dev->device = 0a54 
id = ffffffff" is output from pci_dev_wait(), please refer to 
https://launchpadlibrarian.net/798708446/LP2111521-dmesg-test9.txt

   pci 10000:01:00.0: BAR 0 [mem 0xf8010000-0xf8013fff 64bit]: assigned
   pci 10000:01:00.0: BAR 0: error updating (high 0x00000000 != 0xffffffff)
   pci 10000:01:00.0: BAR 0 [mem 0xf8010000-0xf8013fff 64bit]: assigned
   pci 10000:01:00.0: BAR 0: error updating (0xf8010004 != 0xffffffff)
   nvme nvme0: pci function 10000:01:00.0
   nvme 10000:01:00.0: enabling device (0000 -> 0002)

Things I notice:

   - The 10000:01:00.0 NVMe device is behind a VMD bridge

   - We successfully read the Vendor & Device IDs (8086:0a54)

   - The NVMe device is uninitialized.  We successfully sized the BAR,
     which included successful config reads and writes.  The BAR
     wasn't assigned by BIOS, which is normal since it's behind VMD.

   - We allocated space for BAR 0 but the config writes to program the
     BAR failed.  The read back from the BAR was 0xffffffff; probably a
     PCIe error, e.g., the NVMe device didn't respond.

   - The device *did* respond when nvme_probe() enabled it: the
     "enabling device (0000 -> 0002)" means pci_enable_resources() read
     PCI_COMMAND and got 0x0000.

   - The dmesg from the working config doesn't include the "enabling
     device" line, which suggests that pci_enable_resources() saw
     PCI_COMMAND_MEMORY (0x0002) already set and didn't bother setting
     it again.  I don't know why it would already be set.

d591f6804e7e really only changes pci_dev_wait(), which is used after
device resets.  I think vmd_enable_domain() resets the VMD Root Ports
after pci_scan_child_bus(), and maybe we're not waiting long enough
afterwards.

My guess is that we got the ~0 because we did a config read too soon
after reset and the device didn't respond.  The Root Port would time
out, log an error, and synthesize ~0 data to complete the CPU read
(see PCIe r6.0, sec 2.3.2 implementation note).

It's *possible* that we waited long enough but the NVMe device is
broken and didn't respond when it should have, but my money is on a
software defect.

There are a few pci_dbg() calls about these delays; can you set
CONFIG_DYNAMIC_DEBUG=y and boot with dyndbg="file drivers/pci/* +p" to
collect that output?  Please also collect the "sudo lspci -vv" output
from a working system.

Already passed the testing request to bug reporters, wait for their 
feedback.

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2111521/comments/55

Thanks,

Hui.

Bjorn