On Tue, 26 Aug 2025 20:35:31 -0500 Terry Bowman <terry.bowman@xxxxxxx> wrote: > CXL Endpoint protocol errors are currently handled using PCI error > handlers. The CXL Endpoint requires CXL specific handling in the case of > uncorrectable error (UCE) handling not provided by the PCI handlers. > > Add CXL specific handlers for CXL Endpoints. Rename the existing > cxl_error_handlers to be pci_error_handlers to more correctly indicate > the error type and follow naming consistency. > > The PCI handlers will be called if the CXL device is not trained for > alternate protocol (CXL). Update the CXL Endpoint PCI handlers to call the > CXL UCE handlers. > > The existing EP UCE handler includes checks for various results. These are > no longer needed because CXL UCE recovery will not be attempted. Implement > cxl_handle_ras() to return PCI_ERS_RESULT_NONE or PCI_ERS_RESULT_PANIC. The > CXL UCE handler is called by cxl_do_recovery() that acts on the return > value. In the case of the PCI handler path, call panic() if the result is > PCI_ERS_RESULT_PANIC. > > Signed-off-by: Terry Bowman <terry.bowman@xxxxxxx> > Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@xxxxxxxxxxxxxxx> > > --- > Changes in v10->v11: > - cxl_error_detected() - Change handlers' scoped_guard() to guard() (Jonathan) > - cxl_error_detected() - Remove extra line (Shiju) > - Changes moved to core/ras.c (Terry) > - cxl_error_detected(), remove 'ue' and return with function call. (Jonathan) > - Remove extra space in documentation for PCI_ERS_RESULT_PANIC definition > - Move #include "pci.h from cxl.h to core.h (Terry) > - Remove unnecessary includes of cxl.h and core.h in mem.c (Terry) Hi Terry, A few suggested renames inline just because things like pci_cor_error_detected() sounds like it should have nothing CXL specific in it. Whilst it is clunky to use cxl_pci_cor_error_detected() I think that is worth doing to avoid this potential confusion. > diff --git a/drivers/cxl/core/ras.c b/drivers/cxl/core/ras.c > index 42b6e0b092d5..b285448c2d9c 100644 > --- a/drivers/cxl/core/ras.c > +++ b/drivers/cxl/core/ras.c > > -void cxl_cor_error_detected(struct pci_dev *pdev) > +void cxl_cor_error_detected(struct device *dev) > { > + struct pci_dev *pdev = to_pci_dev(dev); > struct cxl_dev_state *cxlds = pci_get_drvdata(pdev); > - struct device *dev = &cxlds->cxlmd->dev; > + struct device *cxlmd_dev = &cxlds->cxlmd->dev; > > - scoped_guard(device, dev) { Dropping the scoped_guard() to guard() here makes it harder to tell what is going on. I guess it isn't quite worth a precursor patch unless there are several similar refactors to do. > - if (!dev->driver) { > - dev_warn(&pdev->dev, > - "%s: memdev disabled, abort error handling\n", > - dev_name(dev)); > - return; > - } > + guard(device)(cxlmd_dev); > > - if (cxlds->rcd) > - cxl_handle_rdport_errors(cxlds); > - > - cxl_handle_cor_ras(&cxlds->cxlmd->dev, cxlds->serial, cxlds->regs.ras); > + if (!cxlmd_dev->driver) { > + dev_warn(&pdev->dev, "%s: memdev disabled, abort error handling", dev_name(dev)); > + return; > } > + > + if (cxlds->rcd) > + cxl_handle_rdport_errors(cxlds); > + > + cxl_handle_cor_ras(&cxlds->cxlmd->dev, cxlds->serial, cxlds->regs.ras); > } > EXPORT_SYMBOL_NS_GPL(cxl_cor_error_detected, "CXL"); > > -pci_ers_result_t cxl_error_detected(struct pci_dev *pdev, > - pci_channel_state_t state) > +void pci_cor_error_detected(struct pci_dev *pdev) > { > + cxl_cor_error_detected(&pdev->dev); > +} > +EXPORT_SYMBOL_NS_GPL(pci_cor_error_detected, "CXL"); I'd go with a cxl_pci_ prefix for this to avoid impression it is is more generic than that. > > + > +pci_ers_result_t pci_error_detected(struct pci_dev *pdev, Given this is CXL specific, probably makes sense to prefix to give cxl_pci_error_detected() Then when we see the call it is more obvious it isn't a 'generic' PCI thing. > + pci_channel_state_t error) > +{ > + pci_ers_result_t rc; > + > + rc = cxl_error_detected(&pdev->dev); > + if (rc == PCI_ERS_RESULT_PANIC) > + panic("CXL cachemem error."); > + > + return rc; > +} > +EXPORT_SYMBOL_NS_GPL(pci_error_detected, "CXL");