Re: [PATCH] [RFC] PCI: fix pcie secondary bus reset readiness check

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



When passing through multiple devices attached to PCIe switch downstream port through the vfio module,
we can initiate a secondary bus reset (__pci_reset_bus --- pci_bridge_secondary_bus_reset)
using the vfio VFIO_DEVICE_PCI_HOT_RESET call. However, it's crucial to ensure that all devices
have completed reset and initialization before pci_bridge_secondary_bus_reset returns. Otherwise,
directly accessing an unreset device can trigger a device error or even cause it to go offline.

Therefore, it's necessary to wait for all devices to complete reset in pci_bridge_secondary_bus_reset. 
(The above [RFC] patch also requires adjustments to handle situations like long-held locks and unexpected device offlines.)

Thanks

------------------------------------------------------------------
From:Lukas Wunner <lukas@xxxxxxxxx>
Send Time:2025年8月18日(周一) 14:32
To:Guanghui Feng<guanghuifeng@xxxxxxxxxxxxxxxxx>
CC:bhelgaas<bhelgaas@xxxxxxxxxx>; "alikernel-developer"<alikernel-developer@xxxxxxxxxxxxxxxxx>; "linux-pci"<linux-pci@xxxxxxxxxxxxxxx>
Subject:Re: [PATCH] [RFC] PCI: fix pcie secondary bus reset readiness check


On Mon, Aug 18, 2025 at 12:06:40PM +0800, Guanghui Feng wrote:
> When executing a secondary bus reset on a bridge downstream port, all
> downstream devices and switches will be reseted. Before
> pci_bridge_secondary_bus_reset returns, ensure that all available
> devices have completed reset and initialization. Otherwise, using a
> device before initialization completed will result in errors or even
> device offline.

I recently received a report off-list for what looks like the same issue
and came up with the patch below.

Would it fix the issue for you?

It's not yet a properly fleshed-out patch, just a proof of concept.
But it's smaller and simpler than the approach you've taken.

This patch is for a Secondary Bus Reset issued by AER.  Is the bus reset
likewise happening through AER in your case or what's the code path
leading to the bus reset?

-- >8 --

diff --git a/drivers/pci/pcie/portdrv.c b/drivers/pci/pcie/portdrv.c
index fa83ebd..8b427a9 100644
--- a/drivers/pci/pcie/portdrv.c
+++ b/drivers/pci/pcie/portdrv.c
@@ -761,6 +761,10 @@ static pci_ers_result_t pcie_portdrv_slot_reset(struct pci_dev *dev)
 
  pci_restore_state(dev);
  pci_save_state(dev);
+
+ if (pci_bridge_wait_for_secondary_bus(dev, "hot reset"))
+  return PCI_ERS_RESULT_DISCONNECT;
+
  return PCI_ERS_RESULT_RECOVERED;
 }
 






[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux