Re: [PATCH] PCI/ACPI: Fix runtime PM ref imbalance on hot-plug capable ports

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jun 26, 2025 at 09:56:07PM -0500, Bjorn Helgaas wrote:
> On Mon, Jun 23, 2025 at 07:08:20PM +0200, Lukas Wunner wrote:
> > pcie_portdrv_probe() and pcie_portdrv_remove() both call
> > pci_bridge_d3_possible() to determine whether to use runtime power
> > management.  The underlying assumption is that pci_bridge_d3_possible()
> > always returns the same value because otherwise a runtime PM reference
> > imbalance occurs.
> > 
> > That assumption falls apart if the device is inaccessible on ->remove()
> > due to hot-unplug:  pci_bridge_d3_possible() calls pciehp_is_native(),
> > which accesses Config Space to determine whether the device is Hot-Plug
> > Capable.   An inaccessible device returns "all ones", which is converted
> > to "all zeroes" by pcie_capability_read_dword().  Hence the device no
> > longer seems Hot-Plug Capable on ->remove() even though it was on
> > ->probe().
> > 
> > The resulting runtime PM ref imbalance causes errors such as:
> > 
> >   pcieport 0000:02:04.0: Runtime PM usage count underflow!
> > 
> > The Hot-Plug Capable bit is cached in pci_dev->is_hotplug_bridge.
> 
> pci_dev->is_hotplug_bridge is not just a cache of PCI_EXP_SLTCAP_HPC;
> it can also be set in two other cases.  Example below.
[...]
> If we have a bridge where bridge->is_hotplug_bridge=1 from the acpiphp
> check_hotplug_bridge() or quirk_hotplug_bridge(), and the bridge has
> no Slot or a Slot with PCI_EXP_SLTCAP_HPC=0, we previously returned
> false here but may now return true.  That seems like a problem, but
> maybe I'm missing the reason why it is not an issue.

You're absolutely right, thanks for spotting this.
I'm respinning the patch as part of a larger series:

https://lore.kernel.org/r/cover.1752390101.git.lukas@xxxxxxxxx

It seems to me we should cache PCI_EXP_SLTCAP_HPC, rather than
the result of pci_bridge_d3_possible().  That allows cleaning up
is_hotplug_bridge usage and fix a few bugs in that area.

(As an aside, quirk_hotplug_bridge() is for a Conventional PCI hotplug
bridge.  Hence I'm mentioning Conventional PCI in addition to ACPI slots
in a number of code comments and commit messages in the above-linked
series.)

> If the new semantics are what we want, that would be great because I
> think we could probably set host->native_pcie_hotplug up front based
> on CONFIG_HOTPLUG_PCI_PCIE and pcie_ports_native, and
> pciehp_is_native() would collapse to just an accessor for
> host->native_pcie_hotplug.

I've looked into it, see patch [5/5] of the series.  It turns out,
we can get rid of the pcie_ports_native check in pciehp_is_native(),
but not the CONFIG_HOTPLUG_PCI_PCIE check.  So it's only a partial
solution.  And it has the downside that the pcie_ports_native
variable is no longer confined to the PCI core, it now leaks into
ACPI PCI code.  Which means future canges in that area will involve
ACPI and require an ack from Rafael.  Your decision whether that's a
net-positive.  Feel free to ignore the patch if you think it's not!

Thanks!

Lukas




[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux