On 6/12/2025 6:04 AM, Jonathan Cameron wrote: > On Tue, 3 Jun 2025 12:22:26 -0500 > Terry Bowman <terry.bowman@xxxxxxx> wrote: > >> CXL error handling will soon be moved from the AER driver into the CXL >> driver. This requires a notification mechanism for the AER driver to share >> the AER interrupt with the CXL driver. The notification will be used >> as an indication for the CXL drivers to handle and log the CXL RAS errors. >> >> Add a kfifo work queue to be used by the AER driver and CXL driver. The AER >> driver will be the sole kfifo producer adding work and the cxl_core will be >> the sole kfifo consumer removing work. Add the boilerplate kfifo support. >> >> Add CXL work queue handler registration functions in the AER driver. Export >> the functions allowing CXL driver to access. Implement registration >> functions for the CXL driver to assign or clear the work handler function. >> >> Introduce function cxl_create_prot_err_info() and 'struct cxl_prot_err_info'. >> Implement cxl_create_prot_err_info() to populate a 'struct cxl_prot_err_info' >> instance with the AER severity and the erring device's PCI SBDF. The SBDF >> details will be used to rediscover the erring device after the CXL driver >> dequeues the kfifo work. The device rediscovery will be introduced along >> with the CXL handling in future patches. >> >> Signed-off-by: Terry Bowman <terry.bowman@xxxxxxx> > Hi Terry, > > A few trivial bits of feedback. I didn't find anything substantial that > hasn't already been covered by others. > > Jonathan > >> --- >> drivers/cxl/core/ras.c | 31 +++++++++- >> drivers/cxl/cxlpci.h | 1 + >> drivers/pci/pcie/aer.c | 132 ++++++++++++++++++++++++++++------------- >> include/linux/aer.h | 36 +++++++++++ >> 4 files changed, 157 insertions(+), 43 deletions(-) >> >> diff --git a/drivers/cxl/core/ras.c b/drivers/cxl/core/ras.c >> index 485a831695c7..d35525e79e04 100644 >> --- a/drivers/cxl/core/ras.c >> +++ b/drivers/cxl/core/ras.c >> @@ -5,6 +5,7 @@ >> >> void cxl_ras_exit(void) >> { >> cxl_cper_unregister_prot_err_work(&cxl_cper_prot_err_work); >> cancel_work_sync(&cxl_cper_prot_err_work); >> + >> + cxl_unregister_prot_err_work(); >> + cancel_work_sync(&cxl_prot_err_work); >> } >> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c >> index adb4b1123b9b..5350fa5be784 100644 >> --- a/drivers/pci/pcie/aer.c >> +++ b/drivers/pci/pcie/aer.c >> +static int cxl_create_prot_error_info(struct pci_dev *pdev, >> + struct aer_err_info *aer_err_info, >> + struct cxl_prot_error_info *cxl_err_info) >> +{ >> + cxl_err_info->severity = aer_err_info->severity; >> + >> + cxl_err_info->function = PCI_FUNC(pdev->devfn); >> + cxl_err_info->device = PCI_SLOT(pdev->devfn); >> + cxl_err_info->bus = pdev->bus->number; >> + cxl_err_info->segment = pci_domain_nr(pdev->bus); > Maybe > *cxl_err_info = (struct cxl_prot_error_info) { > .severity = aer_err_info->severity, > .function = PCI_FUNC(pdev->devfn), > .device = PCI_SLOT(pdev->devfn), > .bus = pdev->bus_number, > .segment = pci_domain_nr(dev->nbus), > }; > Ok. And I'll take your other recommendation from patch [4/16] to not split devfn. > Or if it isn't going to get more use later, just put that assignment in > forward_cxl_error as this helper doesn't seem to add a huge amount of > value wrt to code reability. > Good idea. This function started out doing more work but has been simplified that it no longer needs be in a helper function. >> + >> + return 0; >> +} >> + >> +static void forward_cxl_error(struct pci_dev *pdev, struct aer_err_info *aer_err_info) >> +{ >> + struct cxl_prot_err_work_data wd; >> + struct cxl_prot_error_info *cxl_err_info = &wd.err_info; >> + >> + cxl_create_prot_error_info(pdev, aer_err_info, cxl_err_info); > cxl_create_prot_error_info(pdev, aer_err_info, &wd.err_info); > > Ignore if this gets more complicated in a later patch but for now I don't > see the local variable adding any value. If anything it slightly obscures > how this gets used via wd in the next line. However given the code is short > that is fairly obvious either way. > > Or as above > wd.error_info = (struct cxl_prot_error_info) { > .severity = ... > }; Ok. I'll let you know if it doesn't work. ;) >> + >> + if (!kfifo_put(&cxl_prot_err_fifo, wd)) { >> + dev_err_ratelimited(&pdev->dev, "CXL kfifo overflow\n"); >> + return; >> + } >> + >> + schedule_work(cxl_prot_err_work); >> +} >> + >> #else >> static inline void cxl_rch_enable_rcec(struct pci_dev *dev) { } >> static inline void cxl_rch_handle_error(struct pci_dev *dev, >> struct aer_err_info *info) { } >> +static inline void forward_cxl_error(struct pci_dev *dev, >> + struct aer_err_info *info) { } > Is this aligned right - looks one space short? > >> +static inline bool handles_cxl_errors(struct pci_dev *dev) >> +{ >> + return false; >> +} >> +static bool is_cxl_error(struct pci_dev *pdev, struct aer_err_info *info) { return 0; }; > wrap choices for the two stubs are a bit odd. The first one is actually shorter > if not wrapped than the second one is that you chose not to wrap. > My slightly preference would be to wrap them both as the fist one. > Thanks for pointing out. Fortunately, this (and above alignment note) is moved into a new file pci/pcie/cxl_aer.c in the next revision and no longer requires this source file #ifdef. -Terry >> #endif >