On 6/27/2025 7:27 AM, Shiju Jose wrote: >> -----Original Message----- >> From: Terry Bowman <terry.bowman@xxxxxxx> >> Sent: 26 June 2025 23:43 >> To: dave@xxxxxxxxxxxx; Jonathan Cameron <jonathan.cameron@xxxxxxxxxx>; >> dave.jiang@xxxxxxxxx; alison.schofield@xxxxxxxxx; dan.j.williams@xxxxxxxxx; >> bhelgaas@xxxxxxxxxx; Shiju Jose <shiju.jose@xxxxxxxxxx>; >> ming.li@xxxxxxxxxxxx; Smita.KoralahalliChannabasappa@xxxxxxx; >> rrichter@xxxxxxx; dan.carpenter@xxxxxxxxxx; >> PradeepVineshReddy.Kodamati@xxxxxxx; lukas@xxxxxxxxx; >> Benjamin.Cheatham@xxxxxxx; >> sathyanarayanan.kuppuswamy@xxxxxxxxxxxxxxx; terry.bowman@xxxxxxx; >> linux-cxl@xxxxxxxxxxxxxxx >> Cc: linux-kernel@xxxxxxxxxxxxxxx; linux-pci@xxxxxxxxxxxxxxx >> Subject: [PATCH v10 07/17] CXL/PCI: Introduce CXL uncorrectable protocol error >> recovery >> >> Create cxl_do_recovery() to provide uncorrectable protocol error (UCE) >> handling. Follow similar design as found in PCIe error driver, >> pcie_do_recovery(). One difference is cxl_do_recovery() will treat all UCEs as >> fatal with a kernel panic. This is to prevent corruption on CXL memory. >> >> Export the PCI error driver's merge_result() to CXL namespace. Introduce >> PCI_ERS_RESULT_PANIC and add support in merge_result() routine. This will be >> used by CXL to panic the system in the case of uncorrectable protocol errors. PCI >> error handling is not currently expected to use the PCI_ERS_RESULT_PANIC. >> >> Copy pci_walk_bridge() to cxl_walk_bridge(). Make a change to walk the first >> device in all cases. >> >> Copy the PCI error driver's report_error_detected() to >> cxl_report_error_detected(). >> Note, only CXL Endpoints and RCH Downstream Ports(RCH DSP) are currently >> supported. Add locking for PCI device as done in PCI's report_error_detected(). >> This is necessary to prevent the RAS registers from disappearing before logging >> is completed. >> >> Call panic() to halt the system in the case of uncorrectable errors (UCE) in >> cxl_do_recovery(). Export pci_aer_clear_fatal_status() for CXL to use if a UCE is >> not found. In this case the AER status must be cleared and uses >> pci_aer_clear_fatal_status(). >> >> Signed-off-by: Terry Bowman <terry.bowman@xxxxxxx> >> --- >> drivers/cxl/core/native_ras.c | 44 +++++++++++++++++++++++++++++++++++ >> drivers/pci/pcie/cxl_aer.c | 3 ++- >> drivers/pci/pcie/err.c | 8 +++++-- >> include/linux/aer.h | 11 +++++++++ >> include/linux/pci.h | 3 +++ >> 5 files changed, 66 insertions(+), 3 deletions(-) >> > [...] >> void pci_print_aer(struct pci_dev *dev, int aer_severity, diff --git >> a/include/linux/pci.h b/include/linux/pci.h index 79326358f641..16a8310e0373 >> 100644 >> --- a/include/linux/pci.h >> +++ b/include/linux/pci.h >> @@ -868,6 +868,9 @@ enum pci_ers_result { >> >> /* No AER capabilities registered for the driver */ >> PCI_ERS_RESULT_NO_AER_DRIVER = (__force pci_ers_result_t) 6, >> + >> + /* System is unstable, panic. Is CXL specific */ >> + PCI_ERS_RESULT_PANIC = (__force pci_ers_result_t) 7, > Extra space is present after casting? >> }; Hi Shiju, I see the existing PCIE_ERS_RESULT entries have a space before the number. For example, PCI_ERS_RESULT_NO_AER_DRIVER = (__force pci_ers_result_t) 6, ^ I do see that I had an extra space in my comment that I will fix. Please let me know if you agree or if I'm missing something? -Terry >> >> /* PCI bus error event callbacks */ >> -- >> 2.34.1