On 2/21/25 04:39, Srirangan Madhavan wrote:
Type 2 devices are being introduced and will require finer-grained
reset mechanisms beyond bus-wide reset methods.
Add support for CXL reset per CXL v3.2 Section 9.6/9.7
Hi,
This seems too early as there is no plans for supporting CXL.cache yet.
It is the second part of the current ongoing type2 support though.
I guess starting the discussion about how to proceed is not a problem,
so my comments below, but my first comment is if the decisions about
what to do should be generic. I think having some helpers for accel
drivers will be good, but the invocation should be in the hands of the
accel driver.
Signed-off-by: Srirangan Madhavan <smadhavan@xxxxxxxxxx>
---
drivers/pci/pci.c | 146 ++++++++++++++++++++++++++++++++++++++++++++
include/cxl/pci.h | 40 ++++++++----
include/linux/pci.h | 2 +-
3 files changed, 174 insertions(+), 14 deletions(-)
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 3ab1871ecf8a..9108daae252b 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -5117,6 +5117,151 @@ static int cxl_reset_bus_function(struct pci_dev *dev, bool probe)
return rc;
}
+static int cxl_reset_prepare(struct pci_dev *dev, u16 dvsec)
+{
+ u32 timeout_us = 100, timeout_tot_us = 10000;
+ u16 reg, cap;
+ int rc;
+
+ if (!pci_wait_for_pending_transaction(dev))
+ pci_err(dev, "timed out waiting for pending transaction; performing cxl reset anyway\n");
+
+ /* Check if the device is cache capable. */
+ rc = pci_read_config_word(dev, dvsec + CXL_DVSEC_CAP_OFFSET, &cap);
+ if (rc)
+ return rc;
+
+ if (!(cap & CXL_DVSEC_CACHE_CAPABLE))
+ return 0;
+
+ /* Disable cache. WB and invalidate cache if capability is advertised */
I do not know how safe is this. IMO, this needs to be synchronized by
the accel driver which could imply to tell user space first. In our case
it would imply to stop rx queues in the netdev for CXL.cache, and tx
queues for CXL.mem. Doing it unconditionally could make current CXL
transactions to stall ... although it could be argued the reset event
implies something is broken, but let's try to do it properly if there is
a chance of the system not unreliable.
+ rc = pci_read_config_word(dev, dvsec + CXL_DVSEC_CTRL2_OFFSET, ®);
+ if (rc)
+ return rc;
+ reg |= CXL_DVSEC_DISABLE_CACHING;
+ /*
+ * DEVCTL2 bits are written only once. So check WB+I capability while
+ * keeping disable caching set.
+ */
+ if (cap & CXL_DVSEC_CACHE_WBI_CAPABLE)
+ reg |= CXL_DVSEC_INIT_CACHE_WBI;
+ pci_write_config_word(dev, dvsec + CXL_DVSEC_CTRL2_OFFSET, reg);
+
+ /*
+ * From Section 9.6: "Software may leverage the cache size reported in
+ * the DVSEC CXL Capability2 register to compute a suitable timeout
+ * value".
+ * Given there is no conversion factor for cache size -> timeout,
+ * setting timer for default 10ms.
+ */
+ do {
+ if (timeout_tot_us == 0)
+ return -ETIMEDOUT;
+ usleep_range(timeout_us, timeout_us + 1);
+ timeout_tot_us -= timeout_us;
+ rc = pci_read_config_word(dev, dvsec + CXL_DVSEC_CTRL2_OFFSET,
+ ®);
+ if (rc)
+ return rc;
+ } while (!(reg & CXL_DVSEC_CACHE_INVALID));
+
+ return 0;
+}
+
+static int cxl_reset_init(struct pci_dev *dev, u16 dvsec)
I think cxl_reset_start after the *_prepare call makes more sense here.
+{
+ /*
+ * Timeout values ref CXL Spec v3.2 Ch 8 Control and Status Registers,
+ * under section 8.1.3.1 DVSEC CXL Capability.
+ */
+ u32 reset_timeouts_ms[] = { 10, 100, 1000, 10000, 100000 };
+ u16 reg;
+ u32 timeout_ms;
+ int rc, ind;
+
+ /* Check if CXL Reset MEM CLR is supported. */
+ rc = pci_read_config_word(dev, dvsec + CXL_DVSEC_CAP_OFFSET, ®);
+ if (rc)
+ return rc;
+
+ if (reg & CXL_DVSEC_CXL_RST_MEM_CLR_CAPABLE) {
+ rc = pci_read_config_word(dev, dvsec + CXL_DVSEC_CTRL2_OFFSET,
+ ®);
+ if (rc)
+ return rc;
+
+ reg |= CXL_DVSEC_CXL_RST_MEM_CLR_ENABLE;
+ pci_write_config_word(dev, dvsec + CXL_DVSEC_CTRL2_OFFSET, reg);
+ }
+
+ /* Read timeout value. */
+ rc = pci_read_config_word(dev, dvsec + CXL_DVSEC_CAP_OFFSET, ®);
+ if (rc)
+ return rc;
+ ind = FIELD_GET(CXL_DVSEC_CXL_RST_TIMEOUT_MASK, reg);
+ timeout_ms = reset_timeouts_ms[ind];
+
+ /* Write reset config. */
+ rc = pci_read_config_word(dev, dvsec + CXL_DVSEC_CTRL2_OFFSET, ®);
+ if (rc)
+ return rc;
+
+ reg |= CXL_DVSEC_INIT_CXL_RESET;
+ pci_write_config_word(dev, dvsec + CXL_DVSEC_CTRL2_OFFSET, reg);
+
+ /* Wait till timeout and then check reset status is complete. */
+ msleep(timeout_ms);
+ rc = pci_read_config_word(dev, dvsec + CXL_DVSEC_STATUS2_OFFSET, ®);
+ if (rc)
+ return rc;
+ if (reg & CXL_DVSEC_CXL_RESET_ERR ||
+ ~reg & CXL_DVSEC_CXL_RST_COMPLETE)
+ return -ETIMEDOUT;
+
+ rc = pci_read_config_word(dev, dvsec + CXL_DVSEC_CTRL2_OFFSET, ®);
+ if (rc)
+ return rc;
+ reg &= (~CXL_DVSEC_DISABLE_CACHING);
+ pci_write_config_word(dev, dvsec + CXL_DVSEC_CTRL2_OFFSET, reg);
+
+ return 0;
+}
+
+/**
+ * cxl_reset - initiate a cxl reset
+ * @dev: device to reset
+ * @probe: if true, return 0 if device can be reset this way
+ *
+ * Initiate a cxl reset on @dev.
+ */
+static int cxl_reset(struct pci_dev *dev, bool probe)
+{
+ u16 dvsec, reg;
+ int rc;
+
+ dvsec = pci_find_dvsec_capability(dev, PCI_VENDOR_ID_CXL,
+ CXL_DVSEC_PCIE_DEVICE);
+ if (!dvsec)
+ return -ENOTTY;
+
+ /* Check if CXL Reset is supported. */
+ rc = pci_read_config_word(dev, dvsec + CXL_DVSEC_CAP_OFFSET, ®);
+ if (rc)
+ return -ENOTTY;
+
+ if (reg & CXL_DVSEC_CXL_RST_CAPABLE == 0)
+ return -ENOTTY;
+
+ if (probe)
+ return 0;
+
+ rc = cxl_reset_prepare(dev, dvsec);
+ if (rc)
+ return rc;
+
+ return cxl_reset_init(dev, dvsec);
One thing this does not cover, and I do not know if it should, is the
fact that the device CXL regs will be reset, so the question is if the
old values should be restored or the device/driver should go through the
same initialization, if a hotplug device, or do it specifically if
already present at boot time and the BIOS doing that first
initialization. In one case the restoration needs to happen, in the
other the old values/objects need to be removed. I think the second case
is more problematic because this is likely involving CXL root complex
configuration performed by the BIOS ... Not trivial at all IMO.
This is the main concern I expressed some time ago when looking at how
type2 should support resets.
Anyway, thank you for sending this series which will foster further
discussions about all this.