Hello Wilfred, Nice to see this feature :) On Wed, Apr 30, 2025 at 05:43:51PM +1000, Wilfred Mallawa wrote: > From: Wilfred Mallawa <wilfred.mallawa@xxxxxxx> > > The PCIe link may go down in cases like firmware crashes or unstable > connections. When this occurs, the PCIe slot must be reset to restore > functionality. However, the current driver lacks link down handling, > forcing users to reboot the system to recover. > > This patch implements the `reset_slot` callback for link down handling > for DWC PCIe host controller. In which, the RC is reset, reconfigured > and link training initiated to recover from the link down event. > > This patch by extension fixes issues with sysfs initiated bus resets. > In that, currently, when a sysfs initiated bus reset is issued, the > endpoint device is non-functional after (may link up with downgraded link > status). With this patch adding support for link down recovery, a sysfs > initiated bus reset works as intended. Testing conducted on a ROCK5B board > with an M.2 NVMe drive. > > Signed-off-by: Wilfred Mallawa <wilfred.mallawa@xxxxxxx> > --- > Hey all, > > This patch builds ontop of [1] to extend the reset slot support for the > DWC PCIe host controller. Which implements link down recovery support > for the DesignWare PCIe host controller by adding a `reset_slot` callback. > This allows the system to recover from PCIe link failures without requiring a reboot. > > This patch by extension improves the behavior of sysfs-initiated bus resets. > Previously, a `reset` issued via sysfs could leave the endpoint in a > non-functional state or with downgraded link parameters. With the added > link down recovery logic, sysfs resets now result in a properly reinitialized > and fully functional endpoint device. This issue was discovered on a > Rock5B board, and thus testing was also conducted on the same platform > with a known good M.2 NVMe drive. > > Thanks! > > [1] https://lore.kernel.org/all/20250417-pcie-reset-slot-v3-0-59a10811c962@xxxxxxxxxx/ > --- > drivers/pci/controller/dwc/Kconfig | 1 + > drivers/pci/controller/dwc/pcie-dw-rockchip.c | 89 ++++++++++++++++++++++++++- > 2 files changed, 88 insertions(+), 2 deletions(-) > > diff --git a/drivers/pci/controller/dwc/Kconfig b/drivers/pci/controller/dwc/Kconfig > index d9f0386396edf66ad0e514a0f545ed24d89fcb6c..878c52de0842e32ca50dfcc4b66231a73ef436c4 100644 > --- a/drivers/pci/controller/dwc/Kconfig > +++ b/drivers/pci/controller/dwc/Kconfig > @@ -347,6 +347,7 @@ config PCIE_ROCKCHIP_DW_HOST > depends on OF > select PCIE_DW_HOST > select PCIE_ROCKCHIP_DW > + select PCI_HOST_COMMON > help > Enables support for the DesignWare PCIe controller in the > Rockchip SoC (except RK3399) to work in host mode. > diff --git a/drivers/pci/controller/dwc/pcie-dw-rockchip.c b/drivers/pci/controller/dwc/pcie-dw-rockchip.c > index 3c6ab71c996ec1246954f52a9454c8ae67956a54..4c2c683d170f19266e1dfe0efde76d6feb23bf7a 100644 > --- a/drivers/pci/controller/dwc/pcie-dw-rockchip.c > +++ b/drivers/pci/controller/dwc/pcie-dw-rockchip.c > @@ -23,6 +23,8 @@ > #include <linux/reset.h> > > #include "pcie-designware.h" > +#include "../../pci.h" > +#include "../pci-host-common.h" > > /* > * The upper 16 bits of PCIE_CLIENT_CONFIG are a write > @@ -83,6 +85,9 @@ struct rockchip_pcie_of_data { > const struct pci_epc_features *epc_features; > }; > > +static int rockchip_pcie_rc_reset_slot(struct pci_host_bridge *bridge, > + struct pci_dev *pdev); > + > static int rockchip_pcie_readl_apb(struct rockchip_pcie *rockchip, u32 reg) > { > return readl_relaxed(rockchip->apb_base + reg); > @@ -256,6 +261,7 @@ static int rockchip_pcie_host_init(struct dw_pcie_rp *pp) > rockchip); > > rockchip_pcie_enable_l0s(pci); > + pp->bridge->reset_slot = rockchip_pcie_rc_reset_slot; > > return 0; > } > @@ -455,6 +461,11 @@ static irqreturn_t rockchip_pcie_rc_sys_irq_thread(int irq, void *arg) > dev_dbg(dev, "PCIE_CLIENT_INTR_STATUS_MISC: %#x\n", reg); > dev_dbg(dev, "LTSSM_STATUS: %#x\n", rockchip_pcie_get_ltssm(rockchip)); > > + if (reg & PCIE_LINK_REQ_RST_NOT_INT) { > + dev_dbg(dev, "hot reset or link-down reset\n"); > + pci_host_handle_link_down(pp->bridge); > + } > + > if (reg & PCIE_RDLH_LINK_UP_CHGED) { > if (rockchip_pcie_link_up(pci)) { > dev_dbg(dev, "Received Link up event. Starting enumeration!\n"); > @@ -536,8 +547,8 @@ static int rockchip_pcie_configure_rc(struct platform_device *pdev, > return ret; > } > > - /* unmask DLL up/down indicator */ > - val = HIWORD_UPDATE(PCIE_RDLH_LINK_UP_CHGED, 0); > + /* unmask DLL up/down indicator and hot reset/link-down reset irq */ > + val = HIWORD_UPDATE(PCIE_RDLH_LINK_UP_CHGED | PCIE_LINK_REQ_RST_NOT_INT, 0); > rockchip_pcie_writel_apb(rockchip, val, PCIE_CLIENT_INTR_MASK_MISC); > > return ret; > @@ -688,6 +699,80 @@ static int rockchip_pcie_probe(struct platform_device *pdev) > return ret; > } > > +static int rockchip_pcie_rc_reset_slot(struct pci_host_bridge *bridge, > + struct pci_dev *pdev) > +{ > + struct pci_bus *bus = bridge->bus; > + struct dw_pcie_rp *pp = bus->sysdata; > + struct dw_pcie *pci = to_dw_pcie_from_pp(pp); > + struct rockchip_pcie *rockchip = to_rockchip_pcie(pci); > + struct device *dev = rockchip->pci.dev; > + u32 val; > + int ret; > + > + dw_pcie_stop_link(pci); > + rockchip_pcie_phy_deinit(rockchip); > + > + ret = reset_control_assert(rockchip->rst); > + if (ret) > + return ret; > + > + ret = rockchip_pcie_phy_init(rockchip); > + if (ret) > + goto disable_regulator; > + > + ret = reset_control_deassert(rockchip->rst); > + if (ret) > + goto deinit_phy; > + > + ret = rockchip_pcie_clk_init(rockchip); > + if (ret) > + goto deinit_phy; Here you call rockchip_pcie_clk_init(), but I don't see that we ever called clk_bulk_disable_unprepare() after stopping the link. I would have expected the clk framework to have given us a warning about enabling clocks that are already enabled. > + > + ret = pp->ops->init(pp); > + if (ret) { > + dev_err(dev, "Host init failed: %d\n", ret); > + goto deinit_clk; > + } > + > + /* LTSSM enable control mode */ > + val = HIWORD_UPDATE_BIT(PCIE_LTSSM_ENABLE_ENHANCE); > + rockchip_pcie_writel_apb(rockchip, val, PCIE_CLIENT_HOT_RESET_CTRL); > + > + rockchip_pcie_writel_apb(rockchip, PCIE_CLIENT_RC_MODE, PCIE_CLIENT_GENERAL_CON); > + > + ret = dw_pcie_setup_rc(pp); > + if (ret) { > + dev_err(dev, "Failed to setup RC: %d\n", ret); > + goto deinit_clk; > + } > + > + /* unmask DLL up/down indicator and hot reset/link-down reset irq */ > + val = HIWORD_UPDATE(PCIE_RDLH_LINK_UP_CHGED | PCIE_LINK_REQ_RST_NOT_INT, 0); > + rockchip_pcie_writel_apb(rockchip, val, PCIE_CLIENT_INTR_MASK_MISC); > + > + ret = dw_pcie_start_link(pci); > + if (ret) > + return ret; Here you are simply returning ret in case of error, should we perhaps goto error if this function fails? > + > + ret = dw_pcie_wait_for_link(pci); > + if (ret) > + return ret; Here, I think that we should simply call dw_pcie_wait_for_link() without checking for error. 1) That is what Mani does in the qcom patch that implements the reset_slot() callback. 2) That is what we do in: https://github.com/torvalds/linux/blob/master/drivers/pci/controller/dwc/pcie-designware-host.c#L558-L559 (Ignore errors, the link may come up later) Kind regards, Niklas