On Thu, Jun 19, 2025 at 09:55:35PM -0500, Mario Limonciello wrote: > When a USB4 dock is unplugged the PCIe bridge it's connected to will > remove issue a "Link Down" and "Card not detected event". The PCI core > will treat this as a surprise hotplug event and unconfigure all downstream > devices. > > pci_stop_bus_device() will call device_release_driver(). As part of device > release sequence pm_runtime_put_sync() is called for the device which will > decrement the runtime counter to 0. After this, the device remove callback > (pci_device_remove()) will be called which again calls pm_runtime_put_sync() > but as the counter is already 0 will cause an underflow. > > This behavior was introduced in commit 967577b062417 ("PCI/PM: Keep runtime > PM enabled for unbound PCI devices") to prevent asymmetrical get/put from > probe/remove, but this misses out on the point that when releasing a driver > the usage count is decremented from the device core. > > Drop the extra call from pci_device_remove(). > > Fixes: 967577b062417 ("PCI/PM: Keep runtime PM enabled for unbound PCI devices") This doesn't look right. The refcount underflow issue seems new, we surely haven't been doing the wrong thing since 2012. > --- a/drivers/pci/pci-driver.c > +++ b/drivers/pci/pci-driver.c > @@ -478,9 +478,6 @@ static void pci_device_remove(struct device *dev) > pci_dev->driver = NULL; > pci_iov_remove(pci_dev); > > - /* Undo the runtime PM settings in local_pci_probe() */ > - pm_runtime_put_sync(dev); > - local_pci_probe() increases the refcount to keep the device in D0. If the driver wants to use runtime suspend, it needs to decrement the refcount on ->probe() and re-increment on ->remove(). In the dmesg output attached to... https://bugzilla.kernel.org/show_bug.cgi?id=220216 ... the device exhibiting the refcount underflow is a PCIe port. Are you also seeing this on a PCIe port or is it a different device? So the refcount decrement happens in pcie_portdrv_probe() and the refcount increment happens in pcie_portdrv_remove(). Both times it's conditional on pci_bridge_d3_possible(). Does that return a different value on probe versus remove? Does any of the port service drivers decrement the refcount once too often? I've just looked through pciehp but cannot find anything out of the ordinary. Looking through recent changes, 002bf2fbc00e and bca84a7b93fd look like potential candidates causing a regression, but the former is for AER (which isn't used in the dmesg attached to the bugzilla) and the latter touches suspend on system sleep, not runtime suspend. Can you maybe instrument the pm_runtime_{get,put}*() functions with a printk() and/or dump_stack() to see where a gratuitous refcount decrement occurs? Alternatively, is there a known-good kernel version which does not exhibit the issue and which could serve as anchor for git bisect? Thanks, Lukas