On 3/29/2025 9:23 PM, Lukas Wunner wrote:
The following was applied as commit eabe8e5e88f5 ("net/mlx5: Handle
sync reset now event").
It does some questionable things (from a PCI perspective), so allow
me to ask for details:
On Wed, Oct 07, 2020 at 09:00:50AM +0300, Moshe Shemesh wrote:
On sync_reset_now event the driver does reload and PCI link toggle to
activate firmware upgrade reset. When the firmware sends this event it
syncs the event on all PFs, so all PFs will do PCI link toggle at once.
To do PCI link toggle, the driver ensures that no other device ID under
the same bridge by checking that all the PF functions under the same PCI
bridge have same device ID. If no other device it uses PCI bridge link
control to turn link down and up.
[...]
--- a/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c
@@ -156,6 +157,120 @@ static void mlx5_sync_reset_request_event(struct work_struct *work)
mlx5_core_warn(dev, "PCI Sync FW Update Reset Ack. Device reset is expected.\n");
}
+#define MLX5_PCI_LINK_UP_TIMEOUT 2000
+
+static int mlx5_pci_link_toggle(struct mlx5_core_dev *dev)
+{
+ struct pci_bus *bridge_bus = dev->pdev->bus;
+ struct pci_dev *bridge = bridge_bus->self;
+ u16 reg16, dev_id, sdev_id;
+ unsigned long timeout;
+ struct pci_dev *sdev;
+ int cap, err;
+ u32 reg32;
+
+ /* Check that all functions under the pci bridge are PFs of
+ * this device otherwise fail this function.
+ */
+ err = pci_read_config_word(dev->pdev, PCI_DEVICE_ID, &dev_id);
+ if (err)
+ return err;
+ list_for_each_entry(sdev, &bridge_bus->devices, bus_list) {
+ err = pci_read_config_word(sdev, PCI_DEVICE_ID, &sdev_id);
+ if (err)
+ return err;
+ if (sdev_id != dev_id)
+ return -EPERM;
+ }
+
+ cap = pci_find_capability(bridge, PCI_CAP_ID_EXP);
+ if (!cap)
+ return -EOPNOTSUPP;
+
+ list_for_each_entry(sdev, &bridge_bus->devices, bus_list) {
+ pci_save_state(sdev);
+ pci_cfg_access_lock(sdev);
+ }
+ /* PCI link toggle */
+ err = pci_read_config_word(bridge, cap + PCI_EXP_LNKCTL, ®16);
+ if (err)
+ return err;
+ reg16 |= PCI_EXP_LNKCTL_LD;
+ err = pci_write_config_word(bridge, cap + PCI_EXP_LNKCTL, reg16);
+ if (err)
+ return err;
+ msleep(500);
+ reg16 &= ~PCI_EXP_LNKCTL_LD;
+ err = pci_write_config_word(bridge, cap + PCI_EXP_LNKCTL, reg16);
+ if (err)
+ return err;
Sorry for late response.
The commit message doesn't state the reason why you're toggling
the Link Disable bit.
It propagates a Hot Reset down the hierarchy, so perhaps that's
the reason you're doing this?
If it is, why didn't you just use one of the existing library calls
such as pci_reset_bus(bridge)?
We need PCI link down on all the device functions (can be 2 or even 4)
for full chip reset, to activate new FW. The hot reset by
pci_reset_bus() has the link down only for 2 ms, not enough for sync
reset on all functions with pci link down for new FW load and boot.
Thanks,
Lukas