Re: [PATCH] PCI/ACPI: Fix runtime PM ref imbalance on hot-plug capable ports

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



[+cc Ilpo]

On Mon, Jun 23, 2025 at 07:08:20PM +0200, Lukas Wunner wrote:
> pcie_portdrv_probe() and pcie_portdrv_remove() both call
> pci_bridge_d3_possible() to determine whether to use runtime power
> management.  The underlying assumption is that pci_bridge_d3_possible()
> always returns the same value because otherwise a runtime PM reference
> imbalance occurs.
> 
> That assumption falls apart if the device is inaccessible on ->remove()
> due to hot-unplug:  pci_bridge_d3_possible() calls pciehp_is_native(),
> which accesses Config Space to determine whether the device is Hot-Plug
> Capable.   An inaccessible device returns "all ones", which is converted
> to "all zeroes" by pcie_capability_read_dword().  Hence the device no
> longer seems Hot-Plug Capable on ->remove() even though it was on
> ->probe().
> 
> The resulting runtime PM ref imbalance causes errors such as:
> 
>   pcieport 0000:02:04.0: Runtime PM usage count underflow!
> 
> The Hot-Plug Capable bit is cached in pci_dev->is_hotplug_bridge.

pci_dev->is_hotplug_bridge is not just a cache of PCI_EXP_SLTCAP_HPC;
it can also be set in two other cases.  Example below.

> pci_bridge_d3_possible() only calls pciehp_is_native() if that flag is
> set.  Re-checking the bit in pciehp_is_native() is thus unnecessary.
> 
> However pciehp_is_native() is also called from hotplug_is_native().  Move
> the Config Space access to that function.  The function is only invoked
> from acpiphp_glue.c, so move it there instead of keeping it in a publicly
> visible header.
> 
> Fixes: 5352a44a561d ("PCI: pciehp: Make pciehp_is_native() stricter")
> Reported-by: Laurent Bigonville <bigon@xxxxxxxx>
> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220216
> Reported-by: Mario Limonciello <mario.limonciello@xxxxxxx>
> Closes: https://lore.kernel.org/r/20250609020223.269407-3-superm1@xxxxxxxxxx/
> Link: https://lore.kernel.org/all/20250620025535.3425049-3-superm1@xxxxxxxxxx/T/#u
> Signed-off-by: Lukas Wunner <lukas@xxxxxxxxx>
> Cc: stable@xxxxxxxxxxxxxxx # v4.18+
> ---
>  drivers/pci/hotplug/acpiphp_glue.c | 15 +++++++++++++++
>  drivers/pci/pci-acpi.c             |  5 -----
>  include/linux/pci_hotplug.h        |  4 ----
>  3 files changed, 15 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/pci/hotplug/acpiphp_glue.c b/drivers/pci/hotplug/acpiphp_glue.c
> index 5b1f271c6034..ae2bb8970f63 100644
> --- a/drivers/pci/hotplug/acpiphp_glue.c
> +++ b/drivers/pci/hotplug/acpiphp_glue.c
> @@ -50,6 +50,21 @@ static void acpiphp_sanitize_bus(struct pci_bus *bus);
>  static void hotplug_event(u32 type, struct acpiphp_context *context);
>  static void free_bridge(struct kref *kref);
>  
> +static bool hotplug_is_native(struct pci_dev *bridge)
> +{
> +	u32 slot_cap;
> +
> +	pcie_capability_read_dword(bridge, PCI_EXP_SLTCAP, &slot_cap);
> +
> +	if (slot_cap & PCI_EXP_SLTCAP_HPC && pciehp_is_native(bridge))
> +		return true;
> +
> +	if (shpchp_is_native(bridge))
> +		return true;
> +
> +	return false;
> +}
> +
>  /**
>   * acpiphp_init_context - Create hotplug context and grab a reference to it.
>   * @adev: ACPI device object to create the context for.
> diff --git a/drivers/pci/pci-acpi.c b/drivers/pci/pci-acpi.c
> index b78e0e417324..57bce9cc8a38 100644
> --- a/drivers/pci/pci-acpi.c
> +++ b/drivers/pci/pci-acpi.c
> @@ -816,15 +816,10 @@ int pci_acpi_program_hp_params(struct pci_dev *dev)
>  bool pciehp_is_native(struct pci_dev *bridge)
>  {
>  	const struct pci_host_bridge *host;
> -	u32 slot_cap;
>  
>  	if (!IS_ENABLED(CONFIG_HOTPLUG_PCI_PCIE))
>  		return false;
>  
> -	pcie_capability_read_dword(bridge, PCI_EXP_SLTCAP, &slot_cap);
> -	if (!(slot_cap & PCI_EXP_SLTCAP_HPC))
> -		return false;

If we have a bridge where bridge->is_hotplug_bridge=1 from the acpiphp
check_hotplug_bridge() or quirk_hotplug_bridge(), and the bridge has
no Slot or a Slot with PCI_EXP_SLTCAP_HPC=0, we previously returned
false here but may now return true.  That seems like a problem, but
maybe I'm missing the reason why it is not an issue.  

Previously pciehp_is_native() depended on the bridge, but now it will
only depend on the negotiation with the platform for native PCIe
hotplug control, so it definitely changes the semantics.

If the new semantics are what we want, that would be great because I
think we could probably set host->native_pcie_hotplug up front based
on CONFIG_HOTPLUG_PCI_PCIE and pcie_ports_native, and
pciehp_is_native() would collapse to just an accessor for
host->native_pcie_hotplug.

>  	if (pcie_ports_native)
>  		return true;
>  
> diff --git a/include/linux/pci_hotplug.h b/include/linux/pci_hotplug.h
> index ec77ccf1fc4d..02efeea62b25 100644
> --- a/include/linux/pci_hotplug.h
> +++ b/include/linux/pci_hotplug.h
> @@ -102,8 +102,4 @@ static inline bool pciehp_is_native(struct pci_dev *bridge) { return true; }
>  static inline bool shpchp_is_native(struct pci_dev *bridge) { return true; }
>  #endif
>  
> -static inline bool hotplug_is_native(struct pci_dev *bridge)
> -{
> -	return pciehp_is_native(bridge) || shpchp_is_native(bridge);
> -}
>  #endif
> -- 
> 2.47.2
> 




[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux