On 6/27/2025 7:22 AM, Shiju Jose wrote: >> -----Original Message----- >> From: Terry Bowman <terry.bowman@xxxxxxx> >> Sent: 26 June 2025 23:43 >> To: dave@xxxxxxxxxxxx; Jonathan Cameron <jonathan.cameron@xxxxxxxxxx>; >> dave.jiang@xxxxxxxxx; alison.schofield@xxxxxxxxx; dan.j.williams@xxxxxxxxx; >> bhelgaas@xxxxxxxxxx; Shiju Jose <shiju.jose@xxxxxxxxxx>; >> ming.li@xxxxxxxxxxxx; Smita.KoralahalliChannabasappa@xxxxxxx; >> rrichter@xxxxxxx; dan.carpenter@xxxxxxxxxx; >> PradeepVineshReddy.Kodamati@xxxxxxx; lukas@xxxxxxxxx; >> Benjamin.Cheatham@xxxxxxx; >> sathyanarayanan.kuppuswamy@xxxxxxxxxxxxxxx; terry.bowman@xxxxxxx; >> linux-cxl@xxxxxxxxxxxxxxx >> Cc: linux-kernel@xxxxxxxxxxxxxxx; linux-pci@xxxxxxxxxxxxxxx >> Subject: [PATCH v10 12/17] cxl/pci: Unify CXL trace logging for CXL Endpoints >> and CXL Ports >> >> CXL currently has separate trace routines for CXL Port errors and CXL Endpoint >> errors. This is inconvenient for the user because they must enable >> 2 sets of trace routines. Make updates to the trace logging such that a single >> trace routine logs both CXL Endpoint and CXL Port protocol errors. >> >> Keep the trace log fields 'memdev' and 'host'. While these are not accurate for >> non-Endpoints the fields will remain as-is to prevent breaking userspace RAS >> trace consumers. >> >> Add serial number parameter to the trace logging. This is used for EPs and 0 is >> provided for CXL port devices without a serial number. >> >> Below is output of correctable and uncorrectable protocol error logging. >> CXL Root Port and CXL Endpoint examples are included below. >> >> Root Port: >> cxl_aer_correctable_error: memdev=0000:0c:00.0 host=pci0000:0c serial: 0 >> status='CRC Threshold Hit' >> cxl_aer_uncorrectable_error: memdev=0000:0c:00.0 host=pci0000:0c serial: 0 >> status: 'Cache Byte Enable Parity Error' first_error: 'Cache Byte Enable Parity >> Error' >> >> Endpoint: >> cxl_aer_correctable_error: memdev=mem3 host=0000:0f:00.0 serial=0 >> status='CRC Threshold Hit' >> cxl_aer_uncorrectable_error: memdev=mem3 host=0000:0f:00.0 serial: 0 status: >> 'Cache Byte Enable Parity Error' first_error: 'Cache Byte Enable Parity Error' >> >> Signed-off-by: Terry Bowman <terry.bowman@xxxxxxx> >> --- >> drivers/cxl/core/pci.c | 19 ++++----- >> drivers/cxl/core/ras.c | 14 ++++--- >> drivers/cxl/core/trace.h | 84 +++++++++------------------------------- >> 3 files changed, 37 insertions(+), 80 deletions(-) >> > [...] >> static void cxl_cper_handle_prot_err(struct cxl_cper_prot_err_work_data >> *data) diff --git a/drivers/cxl/core/trace.h b/drivers/cxl/core/trace.h index >> 25ebfbc1616c..494d6db461a7 100644 >> --- a/drivers/cxl/core/trace.h >> +++ b/drivers/cxl/core/trace.h >> @@ -48,49 +48,22 @@ >> { CXL_RAS_UC_IDE_RX_ERR, "IDE Rx Error" } \ >> ) >> >> -TRACE_EVENT(cxl_port_aer_uncorrectable_error, >> - TP_PROTO(struct device *dev, u32 status, u32 fe, u32 *hl), >> - TP_ARGS(dev, status, fe, hl), >> - TP_STRUCT__entry( >> - __string(device, dev_name(dev)) >> - __string(host, dev_name(dev->parent)) >> - __field(u32, status) >> - __field(u32, first_error) >> - __array(u32, header_log, CXL_HEADERLOG_SIZE_U32) >> - ), >> - TP_fast_assign( >> - __assign_str(device); >> - __assign_str(host); >> - __entry->status = status; >> - __entry->first_error = fe; >> - /* >> - * Embed the 512B headerlog data for user app retrieval and >> - * parsing, but no need to print this in the trace buffer. >> - */ >> - memcpy(__entry->header_log, hl, CXL_HEADERLOG_SIZE); >> - ), >> - TP_printk("device=%s host=%s status: '%s' first_error: '%s'", >> - __get_str(device), __get_str(host), >> - show_uc_errs(__entry->status), >> - show_uc_errs(__entry->first_error) >> - ) >> -); >> - >> TRACE_EVENT(cxl_aer_uncorrectable_error, >> - TP_PROTO(const struct cxl_memdev *cxlmd, u32 status, u32 fe, u32 >> *hl), >> - TP_ARGS(cxlmd, status, fe, hl), >> + TP_PROTO(struct device *dev, u64 serial, u32 status, u32 fe, >> + u32 *hl), >> + TP_ARGS(dev, serial, status, fe, hl), >> TP_STRUCT__entry( >> - __string(memdev, dev_name(&cxlmd->dev)) >> - __string(host, dev_name(cxlmd->dev.parent)) >> + __string(name, dev_name(dev)) >> + __string(parent, dev_name(dev->parent)) > Hi Terry, > > Thanks for considering the feedback given in v9 regarding the compatibility issue > with the rasdaemon. > https://lore.kernel.org/all/959acc682e6e4b52ac0283b37ee21026@xxxxxxxxxx/ > > Probably some confusion w.r.t the feedback. > Unfortunately TP_printk(...) is not an ABI that we need to keep stable, > it's this structure, TP_STRUCT__entry(..) , that matters to the rasdaemon. > >> Oh. Apologies, I didn't realize TP_STRUCT was an ABI requirement. I will change back the TP_STRUCT as well. -Terry >> __field(u64, serial) >> __field(u32, status) >> __field(u32, first_error) >> __array(u32, header_log, CXL_HEADERLOG_SIZE_U32) >> ), >> TP_fast_assign( >> - __assign_str(memdev); >> - __assign_str(host); >> - __entry->serial = cxlmd->cxlds->serial; >> + __assign_str(name); >> + __assign_str(parent); >> + __entry->serial = serial; >> __entry->status = status; >> __entry->first_error = fe; >> /* >> @@ -99,8 +72,8 @@ TRACE_EVENT(cxl_aer_uncorrectable_error, >> */ >> memcpy(__entry->header_log, hl, CXL_HEADERLOG_SIZE); >> ), >> - TP_printk("memdev=%s host=%s serial=%lld: status: '%s' first_error: >> '%s'", >> - __get_str(memdev), __get_str(host), __entry->serial, >> + TP_printk("memdev=%s host=%s serial=%lld status='%s' >> first_error='%s'", >> + __get_str(name), __get_str(parent), __entry->serial, >> show_uc_errs(__entry->status), >> show_uc_errs(__entry->first_error) >> ) >> @@ -124,42 +97,23 @@ TRACE_EVENT(cxl_aer_uncorrectable_error, >> { CXL_RAS_CE_PHYS_LAYER_ERR, "Received Error From Physical Layer" >> } \ >> ) >> >> -TRACE_EVENT(cxl_port_aer_correctable_error, >> - TP_PROTO(struct device *dev, u32 status), >> - TP_ARGS(dev, status), >> - TP_STRUCT__entry( >> - __string(device, dev_name(dev)) >> - __string(host, dev_name(dev->parent)) >> - __field(u32, status) >> - ), >> - TP_fast_assign( >> - __assign_str(device); >> - __assign_str(host); >> - __entry->status = status; >> - ), >> - TP_printk("device=%s host=%s status='%s'", >> - __get_str(device), __get_str(host), >> - show_ce_errs(__entry->status) >> - ) >> -); >> - >> TRACE_EVENT(cxl_aer_correctable_error, >> - TP_PROTO(const struct cxl_memdev *cxlmd, u32 status), >> - TP_ARGS(cxlmd, status), >> + TP_PROTO(struct device *dev, u64 serial, u32 status), >> + TP_ARGS(dev, serial, status), >> TP_STRUCT__entry( >> - __string(memdev, dev_name(&cxlmd->dev)) >> - __string(host, dev_name(cxlmd->dev.parent)) >> + __string(name, dev_name(dev)) >> + __string(parent, dev_name(dev->parent)) > Same as above. >> __field(u64, serial) >> __field(u32, status) >> ), >> TP_fast_assign( >> - __assign_str(memdev); >> - __assign_str(host); >> - __entry->serial = cxlmd->cxlds->serial; >> + __assign_str(name); >> + __assign_str(parent); >> + __entry->serial = serial; >> __entry->status = status; >> ), >> - TP_printk("memdev=%s host=%s serial=%lld: status: '%s'", >> - __get_str(memdev), __get_str(host), __entry->serial, >> + TP_printk("memdev=%s host=%s serial=%lld status='%s'", >> + __get_str(name), __get_str(parent), __entry->serial, >> show_ce_errs(__entry->status) >> ) >> ); >> -- >> 2.34.1 > Thanks, > Shiju