On Tue, Jun 24, 2025 at 09:24:07AM -0500, Bjorn Helgaas wrote: > On Mon, Jun 23, 2025 at 07:08:20PM +0200, Lukas Wunner wrote: > > pcie_portdrv_probe() and pcie_portdrv_remove() both call > > pci_bridge_d3_possible() to determine whether to use runtime power > > management. The underlying assumption is that pci_bridge_d3_possible() > > always returns the same value because otherwise a runtime PM reference > > imbalance occurs. > > > > That assumption falls apart if the device is inaccessible on ->remove() > > due to hot-unplug: pci_bridge_d3_possible() calls pciehp_is_native(), > > which accesses Config Space to determine whether the device is Hot-Plug > > Capable. An inaccessible device returns "all ones", which is converted > > to "all zeroes" by pcie_capability_read_dword(). Hence the device no > > longer seems Hot-Plug Capable on ->remove() even though it was on > > ->probe(). > > This is pretty subtle; thanks for chasing it down. > > It doesn't look like anything in pci_bridge_d3_possible() should > change over the life of the device, although acpi_pci_bridge_d3() is > non-trivial. > > Should we consider calling pci_bridge_d3_possible() only once and > caching the result? We already call it in pci_pm_init() and save the > result in dev->bridge_d3. That member can be changed by > pci_bridge_d3_update(), but we could add another copy that we never > update after pci_pm_init(). > > I worry a little that the fix is equally subtle and we could easily > reintroduce this issue with future code reorganization. I think this fix makes sense regardless of whether or not the return value of pci_bridge_d3_possible() is cached: Right now pciehp_is_native() reads the Hot-Plug Capable bit from the register even though the bit is cached in pci_dev->is_hotplug_bridge and pci_bridge_d3_possible() only calls pciehp_is_native() if that flag is set. In other words, pciehp_is_native() is re-checking the condition under which it was called. That's just nonsensical and superfluous. There's only one other caller of pciehp_is_native() and that's hotplug_is_native(). Only that other caller needs the register read, so it should be moved there. So I think the question of whether the pci_bridge_d3_possible() return value should be cached is orthogonal to this patch. Thanks, Lukas