On Fri, Apr 11, 2025 at 03:28:15PM -0700, Sathyanarayanan Kuppuswamy wrote: > On 4/10/25 8:27 AM, Lukas Wunner wrote: > > Introduce two helpers to annotate code sections which cause spurious link > > changes: pci_hp_ignore_link_change() and pci_hp_unignore_link_change() > > Use those helpers in lieu of masking interrupts in the Slot Control > > register. > > > > Introduce a helper to check whether such a code section is executing > > concurrently and if so, await it: pci_hp_spurious_link_change() > > Invoke the helper in the hotplug IRQ thread pciehp_ist(). Re-use the > > IRQ thread's existing code which ignores DPC-induced link changes unless > > the link is unexpectedly down after reset recovery or the device was > > replaced during the bus reset. > > Since most of the changes in this patch is related to adding framework to > ignore spurious hotplug event, why not split it in to a separate patch? The idea is to ease backporting. The issue fixes VFIO passthrough on v6.13-rc1 and newer, so the patch will have to be backported at least to v6.13, v6.14, v6.15. > Is this fix tested in any platform? Yes, Joel Mathew Thomas successfully tested it: https://bugzilla.kernel.org/show_bug.cgi?id=219765 That's an Asus TUF FA507NU dual-GPU laptop (AMD CPU + Nvidia discrete GPU). The Nvidia GPU is passed through to a VM. > > --- a/drivers/pci/pci.h > > +++ b/drivers/pci/pci.h > > @@ -227,6 +227,7 @@ static inline bool pcie_downstream_port(const struct pci_dev *dev) > > /* Functions for PCI Hotplug drivers to use */ > > int pci_hp_add_bridge(struct pci_dev *dev); > > +bool pci_hp_spurious_link_change(struct pci_dev *pdev); > > Do you expect this function used outside hotplug code? If not why not leave > the declaration in pciehp.h? Any hotplug driver may call this. In particular, there are two other drivers implementing the ->reset_slot() callback: pnv_php.c and s390_pci_hpc.c pnv_php.c does a similar dance as pciehp_hpc.c (before this patch): It disables the interrupt, performs a Secondary Bus Reset, clears spurious events and re-enables the interrupt. I think it can be converted to use the newly introduced API. That should make it more robust against removal or replacement of the device during the Secondary Bus Reset. Also, to cope with runtime suspend to D3cold, there's an ignore_hotplug bit in struct pci_dev. The bit is set by drivers for discrete GPUs and is honored by acpiphp and pciehp. I'd like to remove the bit in favor of the new mechanism introduced here and that means acpiphp will have to be converted to call pci_hp_spurious_link_change(). One thing that's problematic about the current behavior is that hotplug events are ignored wholesale for GPUs which runtime suspend to D3cold. That works for discrete GPUs in laptops which are soldered down on the mainboard, but doesn't work for GPUs which are plugged into an actual hotplug port, e.g. data center GPUs. The new API will allow detecting and ignoring spurious events in a more surgical manner. I envision that __pci_set_power_state() will call pci_hp_ignore_link_change() if the target power state is D3cold. Also vga_switcheroo.c will have to call that. But none of the GPU drivers will have to call pci_ignore_hotplug() anymore. To summarize, there are at least two other hotplug drivers besides pciehp which I expect will call pci_hp_spurious_link_change() in the foreseeable future, acpiphp and pnv_php, hence the declaration is not in pciehp.h but in drivers/pci/pci.h. > > --- a/include/linux/pci.h > > +++ b/include/linux/pci.h > > @@ -1848,6 +1848,14 @@ static inline void pcie_no_aspm(void) { } > > static inline bool pcie_aspm_enabled(struct pci_dev *pdev) { return false; } > > #endif > > +#ifdef CONFIG_HOTPLUG_PCI > > +void pci_hp_ignore_link_change(struct pci_dev *pdev); > > +void pci_hp_unignore_link_change(struct pci_dev *pdev); > > +#else > > +static inline void pci_hp_ignore_link_change(struct pci_dev *pdev) { } > > +static inline void pci_hp_unignore_link_change(struct pci_dev *pdev) { } > > +#endif > > + > > Generally we expose APIs when we really need it. Since you have already > identified some use cases where you will use it in other drivers, why not > include one such change as an example? Mostly because I wanted to get this fix out the door quickly before more people come across the regression it addresses. Converting the Mellanox Ethernet driver to use the API would require an ack from its maintainers. Since it's not urgent, I was planning to do that in a subsequent cycle. Same for the conversion of D3cold handling. Thanks, Lukas