在 2025/6/19 19:52, Lukas Wunner 写道: > On Thu, Jun 19, 2025 at 05:32:28PM +0800, Hongbo Yao wrote: >> Fixed 1-second delay in remove_board() fails to accommodate certain >> hardware like multi-host OCP cards, which exhibit longer power-off >> latencies. > > Please name the affected product(s). > > They don't seem to comply to the spec. How prevalent are they? > If there are only few deployed, quirks like this are probably > best addressed by an out-of-tree patch. > Hi Lukas, Thank you for reviewing the patch. The affected hardware configuration: - Host system: Arm Neoverse N2 based server - Multi-host OCP card: Mellanox Technologies MT2910 Family [ConnectX-7] >> Logs before fix: >> [157.778307] pcieport 0003:00:00.0: pciehp: pending interrupts 0x0001 from Slot Status >> [157.778321] pcieport 0003:00:00.0: pciehp: Slot(31): Attention button pressed >> [157.785445] pcieport 0003:00:00.0: pciehp: Slot(31): Powering off due to button press >> [157.798931] pcieport 000b:00:02.0: pciehp: pending interrupts 0x0001 from Slot Status > > This log excerpt mixes messages from two separate hotplug ports > (0003:00:00.0 and 000b:00:02.0). Are these hotplug ports related? > If not, please reduce the log excerpt to a single hotplug port > to avoid confusion. > Sorry for not providing adequate context in the patch submission. Yes, these two hotplug ports are related - they are part of the same physical multi-host OCP card. Key points: 1. The OCP card has two independent PCIe endpoints 2. Each endpoint connected to a PCIe root port: - Endpoint 1 → Port 0003:00:00.0 - Endpoint 2 → Port 000b:00:02.0 3. Both endpoints share a common power domain 4. Full power-off occurs only after BOTH endpoints are powered down 5. DLLSC is triggered only after complete power-off Critical log events: [157.778307] Both ports: Attention button pressed [167.540342] Port 0003:00:00.0 power off command issued [172.289366] Port 000b:00:02.0 power off command issued [172.302385] Card fully powered off, trigger AER interrupts and DLLSC Full power-off occurs only after BOTH ports complete their sequences, taking about 5s total. >> --- a/drivers/pci/hotplug/pciehp_ctrl.c >> +++ b/drivers/pci/hotplug/pciehp_ctrl.c >> @@ -30,6 +30,25 @@ >> #define SAFE_REMOVAL true >> #define SURPRISE_REMOVAL false >> >> +static void pciehp_wait_for_link_inactive(struct controller *ctrl) >> +{ >> + u16 lnk_status; >> + int timeout = 10000, step = 20; >> + >> + do { >> + pcie_capability_read_word(ctrl->pcie->port, PCI_EXP_LNKSTA, >> + &lnk_status); >> + >> + if (!(lnk_status & PCI_EXP_LNKSTA_DLLLA)) >> + return; >> + >> + msleep(step); >> + timeout -= step; >> + } while (timeout >= 0); >> + >> + ctrl_dbg(ctrl, "Timeout waiting for link inactive state\n"); >> +} > > Any chance you can use one of the existing helpers, such as > pcie_wait_for_link()? > > Is the 10 second delay chosen arbitrarily or how did you come up > with it? How much time do the affected products really need? > Ok, I will try to use pcie_wait_for_link(). The 10-second timeout was determined from actual log observations. The power-off process for the multi-host OCP card takes approximately 5-9 seconds in our measurements. >> @@ -119,8 +138,11 @@ static void remove_board(struct controller *ctrl, bool safe_removal) >> * After turning power off, we must wait for at least 1 second >> * before taking any action that relies on power having been >> * removed from the slot/adapter. >> + * >> + * Extended wait with polling to ensure hardware has completed >> + * power-off sequence. >> */ >> - msleep(1000); >> + pciehp_wait_for_link_inactive(ctrl); >> >> /* Ignore link or presence changes caused by power off */ >> atomic_and(~(PCI_EXP_SLTSTA_DLLSC | PCI_EXP_SLTSTA_PDC), > > Please keep the msleep(1000), that's the minimum we need to wait > per PCIe r6.3 sec 6.7.1.8. > > Please make the extra wait for link down conditional on > ctrl->pcie->port->link_active_reporting. (DLLLA reporting is > optional for hotplug ports conforming to older spec revisions.) > Thank you for the valuable suggestion. i'll revise the patch Best regards, Hongbo.> Thanks, > > Lukas > >