Re: [PATCH v3 2/2] PCI: Fix runtime PM usage count underflow on device unplug

Mario Limonciello <superm1@xxxxxxxxxx> · Sun, 22 Jun 2025 20:47:03 -0500

On 6/22/25 1:39 PM, Mario Limonciello wrote:
On 6/21/2025 11:43 PM, Lukas Wunner wrote:
On Sat, Jun 21, 2025 at 02:56:08PM -0500, Mario Limonciello wrote:
On 6/21/25 2:05 PM, Lukas Wunner wrote:
In the dmesg output attached to...

https://bugzilla.kernel.org/show_bug.cgi?id=220216

... the device exhibiting the refcount underflow is a PCIe port.
Are you also seeing this on a PCIe port or is it a different device?

The device with the underflow is the disconnected PCIe bridge.

In my case it was this bridge that was downstream.

Okay, in both cases the refcount underflow occurs on a PCIe port.
So it seems likely the gratuitous refcount decrement is in portdrv.c
or one of the port services drivers.

Your patch changes the code path for *all* PCI devices.
Not just PCIe ports.  Hence it's likely not the right fix.

It may fix the issue on this particular PCIe port but
I strongly suspect it'll leak a runtime PM ref on all other devices.


Thanks, I see your point.


So the refcount decrement happens in pcie_portdrv_probe() and
the refcount increment happens in pcie_portdrv_remove().
Both times it's conditional on pci_bridge_d3_possible().
Does that return a different value on probe versus remove?

Could you please answer this?

I did this check and yes specifically on this PCIe port with the 
underflow the d3 possible lookup returns false during 
pcie_portdrv_remove().  It returns true during pcie_portdrv_probe().



Does any of the port service drivers decrement the refcount
once too often?  I've just looked through pciehp but cannot
find anything out of the ordinary.

Looking through recent changes, 002bf2fbc00e and bca84a7b93fd
look like potential candidates causing a regression, but the
former is for AER (which isn't used in the dmesg attached to
the bugzilla) and the latter touches suspend on system sleep,
not runtime suspend.

Can you maybe instrument the pm_runtime_{get,put}*() functions
with a printk() and/or dump_stack() to see where a gratuitous
refcount decrement occurs?

That's exactly what I did to conclude this call was an extra one.

Here's the drop to 0:

The drop to 0 is uninteresting.  You need to record *all*
refcount increments/decrements so that we can see where the
gratuitous one occurs.  It happens earlier than the drop to 0.

However, please first check whether the pci_bridge_d3_possible()
return value changes on probe versus remove of the PCIe port
in question.  If it does, then that's the root cause and there's
no need to look any further.


That was a great hypothesis that's spot on.

Just for posterity this was all the increment/decrement calls that I saw 
happen.

pci 0000:02:04.0: inc usage cnt from 0, caller: pci_pm_init+0x84/0x2d0
pci 0000:02:04.0: inc usage cnt from 1, caller: 
pci_scan_bridge_extend+0x6d/0x710
pci 0000:02:04.0: dec usage cnt from 2, caller: 
pci_scan_bridge_extend+0x19e/0x710
pci 0000:02:04.0: inc usage cnt from 1, caller: 
pci_scan_bridge_extend+0x6d/0x710
pci 0000:02:04.0: dec usage cnt from 2, caller: 
pci_scan_bridge_extend+0x19e/0x710
pcieport 0000:02:04.0: inc usage cnt from 1, caller: 
local_pci_probe+0x2d/0xa0
pcieport 0000:02:04.0: inc usage cnt from 2, caller: 
__device_attach+0x9c/0x1b0
pcieport 0000:02:04.0: inc usage cnt from 3, caller: 
__driver_probe_device+0x5c/0x150
pcieport 0000:02:04.0: dec usage cnt from 4, caller: 
__driver_probe_device+0x9a/0x150
pcieport 0000:02:04.0: dec usage cnt from 3, caller: 
__device_attach+0x145/0x1b0
pcieport 0000:02:04.0: dec usage cnt from 2, caller: 
pcie_portdrv_probe+0x19d/0x6d0
pcieport 0000:02:04.0: dec usage cnt from 1, caller: 
pcie_portdrv_probe+0x1a5/0x6d0
pcieport 0000:02:04.0: inc usage cnt from 0, caller: 
device_release_driver_internal+0xac/0x200
pcieport 0000:02:04.0: dec usage cnt from 1, caller: 
device_release_driver_internal+0x197/0x200
pcieport 0000:02:04.0: inc usage cnt from 0, caller: 
pci_device_remove+0x2d/0xb0
pcieport 0000:02:04.0: dec usage cnt from 0, caller: 
pci_device_remove+0x7e/0xb0
pcieport 0000:02:04.0: Runtime PM usage cnt underflow!

What's your suggestion on what to actually do here then?



Actually I came up with the idea to forbid runtime PM on the service 
when it doesn't allow d3 at probe which I believe means no need to check 
again on remove.

This works cleanly for me.  LMK what you think of this.

diff --git a/drivers/pci/pcie/portdrv.c b/drivers/pci/pcie/portdrv.c
index e8318fd5f6ed5..a85cd7412cf4d 100644
--- a/drivers/pci/pcie/portdrv.c
+++ b/drivers/pci/pcie/portdrv.c
@@ -717,6 +717,8 @@ static int pcie_portdrv_probe(struct pci_dev *dev,
                pm_runtime_mark_last_busy(&dev->dev);
                pm_runtime_put_autosuspend(&dev->dev);
                pm_runtime_allow(&dev->dev);
+       } else {
+               pm_runtime_forbid(&dev->dev);
        }

        return 0;
@@ -724,11 +726,9 @@ static int pcie_portdrv_probe(struct pci_dev *dev,

 static void pcie_portdrv_remove(struct pci_dev *dev)
 {
-       if (pci_bridge_d3_possible(dev)) {
-               pm_runtime_forbid(&dev->dev);
-               pm_runtime_get_noresume(&dev->dev);
-               pm_runtime_dont_use_autosuspend(&dev->dev);
-       }
+       pm_runtime_forbid(&dev->dev);
+       pm_runtime_get_noresume(&dev->dev);
+       pm_runtime_dont_use_autosuspend(&dev->dev);

        pcie_port_device_remove(dev);

@@ -737,11 +737,9 @@ static void pcie_portdrv_remove(struct pci_dev *dev)

 static void pcie_portdrv_shutdown(struct pci_dev *dev)
 {
-       if (pci_bridge_d3_possible(dev)) {
-               pm_runtime_forbid(&dev->dev);
-               pm_runtime_get_noresume(&dev->dev);
-               pm_runtime_dont_use_autosuspend(&dev->dev);
-       }
+       pm_runtime_forbid(&dev->dev);
+       pm_runtime_get_noresume(&dev->dev);
+       pm_runtime_dont_use_autosuspend(&dev->dev);

        pcie_port_device_remove(dev);
 }