Re: [PATCH v3 2/2] PCI: Fix runtime PM usage count underflow on device unplug

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



[cc += Mika]

On Mon, Jun 23, 2025 at 06:37:33AM -0500, Mario Limonciello wrote:
> On 6/23/25 5:11 AM, Rafael J. Wysocki wrote:
> > On Mon, Jun 23, 2025 at 12:05 PM Lukas Wunner <lukas@xxxxxxxxx> wrote:
> > > pcie_portdrv_probe() and pcie_portdrv_remove() both call
> > > pci_bridge_d3_possible() to determine whether to use runtime power
> > > management.  The underlying assumption is that pci_bridge_d3_possible()
> > > always returns the same value because otherwise a runtime PM reference
> > > imbalance occurs.
> > > 
> > > That assumption falls apart if the device is inaccessible on ->remove()
> > > due to hot-unplug:  pci_bridge_d3_possible() calls pciehp_is_native(),
> > > which accesses Config Space to determine whether the device is Hot-Plug
> > > Capable.   An inaccessible device generally returns "all ones" for such
> > > Config Read Requests.  Hence the device may seem Hot-Plug Capable on
> > > ->remove() even though it wasn't on ->probe().
> > > 
> > > Use the cached copy of the Hot-Plug Capable bit to avoid the Config Space
> > > access and the resulting runtime PM ref imbalance.
> > > 
> > > Signed-off-by: Lukas Wunner <lukas@xxxxxxxxx>
> > 
> > Reviewed-by: Rafael J. Wysocki <rafael@xxxxxxxxxx>
> 
> Tested-by: Mario Limonciello <mario.limonciello@xxxxxxx>>

I ended up changing the patch significantly, so I did not include
Rafael's Reviewed-by and Mario's Tested-by in the final patch.
My apologies for this!

Looking at the commit which introduced the Config Space read,
5352a44a561d, I got the impression that Mika may have deliberately
avoided using the is_hotplug_bridge flag.  Notably, is_hotplug_bridge
is also set by check_hotplug_bridge() in acpiphp_glue.c, and his
intention was probably to avoid matching those bridges in
pciehp_is_native().

So I decided to err on the side of caution and keep the Config Space
read if pciehp_is_native() is called from hotplug_is_native().
Just to avoid any potential regressions since the fix is tagged for
stable.

I also searched lore for occurrences of the keywords...

  pcieport Runtime PM usage count underflow

...and did find quite a few reports, but this error message was just
a side effect and the reports were about completely different issues.
It does prove though that this bug has existed for a while!

Thanks Laurent for the report and Mario for root-causing this!

Lukas




[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux