On 6/6/2025 9:41 AM, Bowman, Terry wrote: > > On 6/6/2025 4:08 AM, Shiju Jose wrote: >>> -----Original Message----- >>> From: Terry Bowman <terry.bowman@xxxxxxx> >>> Sent: 03 June 2025 18:23 >>> To: PradeepVineshReddy.Kodamati@xxxxxxx; dave@xxxxxxxxxxxx; Jonathan >>> Cameron <jonathan.cameron@xxxxxxxxxx>; dave.jiang@xxxxxxxxx; >>> alison.schofield@xxxxxxxxx; vishal.l.verma@xxxxxxxxx; ira.weiny@xxxxxxxxx; >>> dan.j.williams@xxxxxxxxx; bhelgaas@xxxxxxxxxx; bp@xxxxxxxxx; >>> ming.li@xxxxxxxxxxxx; Shiju Jose <shiju.jose@xxxxxxxxxx>; >>> dan.carpenter@xxxxxxxxxx; Smita.KoralahalliChannabasappa@xxxxxxx; >>> kobayashi.da-06@xxxxxxxxxxx; terry.bowman@xxxxxxx; yanfei.xu@xxxxxxxxx; >>> rrichter@xxxxxxx; peterz@xxxxxxxxxxxxx; colyli@xxxxxxx; >>> uaisheng.ye@xxxxxxxxx; fabio.m.de.francesco@xxxxxxxxxxxxxxx; >>> ilpo.jarvinen@xxxxxxxxxxxxxxx; yazen.ghannam@xxxxxxx; linux- >>> cxl@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; linux-pci@xxxxxxxxxxxxxxx >>> Subject: [PATCH v9 10/16] cxl/pci: Unify CXL trace logging for CXL Endpoints and >>> CXL Ports >>> >>> CXL currently has separate trace routines for CXL Port errors and CXL Endpoint >>> errors. This is inconvenient for the user because they must enable >>> 2 sets of trace routines. Make updates to the trace logging such that a single >>> trace routine logs both CXL Endpoint and CXL Port protocol errors. >>> >>> Rename the 'host' field from the CXL Endpoint trace to 'parent' in the unified >>> trace routines. 'host' does not correctly apply to CXL Port devices. Parent is more >>> general and applies to CXL Port devices and CXL Endpoints. >>> >>> Add serial number parameter to the trace logging. This is used for EPs and 0 is >>> provided for CXL port devices without a serial number. >>> >>> Below is output of correctable and uncorrectable protocol error logging. >>> CXL Root Port and CXL Endpoint examples are included below. >>> >>> Root Port: >>> cxl_aer_correctable_error: device=0000:0c:00.0 parent=pci0000:0c serial: 0 >>> status='CRC Threshold Hit' >>> cxl_aer_uncorrectable_error: device=0000:0c:00.0 parent=pci0000:0c serial: 0 >>> status: 'Cache Byte Enable Parity Error' first_error: 'Cache Byte Enable Parity >>> Error' >>> >>> Endpoint: >>> cxl_aer_correctable_error: device=mem3 parent=0000:0f:00.0 serial=0 >>> status='CRC Threshold Hit' >>> cxl_aer_uncorrectable_error: device=mem3 parent=0000:0f:00.0 serial: 0 >>> status: 'Cache Byte Enable Parity Error' first_error: 'Cache Byte Enable Parity >>> Error' >>> >>> Signed-off-by: Terry Bowman <terry.bowman@xxxxxxx> >>> --- >>> drivers/cxl/core/pci.c | 18 +++++---- >>> drivers/cxl/core/ras.c | 14 ++++--- >>> drivers/cxl/core/trace.h | 84 +++++++++------------------------------- >>> 3 files changed, 37 insertions(+), 79 deletions(-) >>> >>> diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c index >>> 186a5a20b951..0f4c07fd64a5 100644 >>> --- a/drivers/cxl/core/pci.c >>> +++ b/drivers/cxl/core/pci.c >>> @@ -664,7 +664,7 @@ void read_cdat_data(struct cxl_port *port) } >>> EXPORT_SYMBOL_NS_GPL(read_cdat_data, "CXL"); >>> >> [...] >>> static void cxl_cper_handle_prot_err(struct cxl_cper_prot_err_work_data >>> *data) diff --git a/drivers/cxl/core/trace.h b/drivers/cxl/core/trace.h index >>> 25ebfbc1616c..8c91b0f3d165 100644 >>> --- a/drivers/cxl/core/trace.h >>> +++ b/drivers/cxl/core/trace.h >>> @@ -48,49 +48,22 @@ >>> { CXL_RAS_UC_IDE_RX_ERR, "IDE Rx Error" } \ >>> ) >>> >>> -TRACE_EVENT(cxl_port_aer_uncorrectable_error, >>> - TP_PROTO(struct device *dev, u32 status, u32 fe, u32 *hl), >>> - TP_ARGS(dev, status, fe, hl), >>> - TP_STRUCT__entry( >>> - __string(device, dev_name(dev)) >>> - __string(host, dev_name(dev->parent)) >>> - __field(u32, status) >>> - __field(u32, first_error) >>> - __array(u32, header_log, CXL_HEADERLOG_SIZE_U32) >>> - ), >>> - TP_fast_assign( >>> - __assign_str(device); >>> - __assign_str(host); >>> - __entry->status = status; >>> - __entry->first_error = fe; >>> - /* >>> - * Embed the 512B headerlog data for user app retrieval and >>> - * parsing, but no need to print this in the trace buffer. >>> - */ >>> - memcpy(__entry->header_log, hl, CXL_HEADERLOG_SIZE); >>> - ), >>> - TP_printk("device=%s host=%s status: '%s' first_error: '%s'", >>> - __get_str(device), __get_str(host), >>> - show_uc_errs(__entry->status), >>> - show_uc_errs(__entry->first_error) >>> - ) >>> -); >>> - >>> TRACE_EVENT(cxl_aer_uncorrectable_error, >>> - TP_PROTO(const struct cxl_memdev *cxlmd, u32 status, u32 fe, u32 >>> *hl), >>> - TP_ARGS(cxlmd, status, fe, hl), >>> + TP_PROTO(struct device *dev, u64 serial, u32 status, u32 fe, >>> + u32 *hl), >>> + TP_ARGS(dev, serial, status, fe, hl), >>> TP_STRUCT__entry( >>> - __string(memdev, dev_name(&cxlmd->dev)) >>> - __string(host, dev_name(cxlmd->dev.parent)) >>> + __string(name, dev_name(dev)) >>> + __string(parent, dev_name(dev->parent)) >> Hi Terry, >> >> As we pointed out in v8, renaming the fields "memdev" to "name" and "host" to "parent" >> causes issues and failures in userspace rasdaemon while parsing the trace event data. >> Additionally, we can't rename these fields in rasdaemon due to backward compatibility. > Yes, I remember but didn't understand why other SW couldn't be updated to handle. I will > change as you request but many people will be confused why a port device's name is labeled > as a memdev. memdev is only correct for EPs and does not correctly reflect *any* of the > other CXL device types (RP, USP, DSP). > >>> __field(u64, serial) >>> __field(u32, status) >>> __field(u32, first_error) >>> __array(u32, header_log, CXL_HEADERLOG_SIZE_U32) >>> ), >>> TP_fast_assign( >>> - __assign_str(memdev); >>> - __assign_str(host); >>> - __entry->serial = cxlmd->cxlds->serial; >>> + __assign_str(name); >>> + __assign_str(parent); >>> + __entry->serial = serial; >>> __entry->status = status; >>> __entry->first_error = fe; >>> /* >>> @@ -99,8 +72,8 @@ TRACE_EVENT(cxl_aer_uncorrectable_error, >>> */ >>> memcpy(__entry->header_log, hl, CXL_HEADERLOG_SIZE); >>> ), >>> - TP_printk("memdev=%s host=%s serial=%lld: status: '%s' first_error: >>> '%s'", >>> - __get_str(memdev), __get_str(host), __entry->serial, >>> + TP_printk("device=%s parent=%s serial=%lld status='%s' >>> first_error='%s'", >>> + __get_str(name), __get_str(parent), __entry->serial, >>> show_uc_errs(__entry->status), >>> show_uc_errs(__entry->first_error) >>> ) >>> @@ -124,42 +97,23 @@ TRACE_EVENT(cxl_aer_uncorrectable_error, >>> { CXL_RAS_CE_PHYS_LAYER_ERR, "Received Error From Physical Layer" >>> } \ >>> ) >>> >>> -TRACE_EVENT(cxl_port_aer_correctable_error, >>> - TP_PROTO(struct device *dev, u32 status), >>> - TP_ARGS(dev, status), >>> - TP_STRUCT__entry( >>> - __string(device, dev_name(dev)) >>> - __string(host, dev_name(dev->parent)) >>> - __field(u32, status) >>> - ), >>> - TP_fast_assign( >>> - __assign_str(device); >>> - __assign_str(host); >>> - __entry->status = status; >>> - ), >>> - TP_printk("device=%s host=%s status='%s'", >>> - __get_str(device), __get_str(host), >>> - show_ce_errs(__entry->status) >>> - ) >>> -); >>> - >>> TRACE_EVENT(cxl_aer_correctable_error, >>> - TP_PROTO(const struct cxl_memdev *cxlmd, u32 status), >>> - TP_ARGS(cxlmd, status), >>> + TP_PROTO(struct device *dev, u64 serial, u32 status), >>> + TP_ARGS(dev, serial, status), >>> TP_STRUCT__entry( >>> - __string(memdev, dev_name(&cxlmd->dev)) >>> - __string(host, dev_name(cxlmd->dev.parent)) >>> + __string(name, dev_name(dev)) >>> + __string(parent, dev_name(dev->parent)) >> Renaming these fields is an issue for userspace as mentioned above >> in cxl_aer_uncorrectable_error. > I understand, I'll revert as you request. > > Terry I'll update the commit message with explanation for leaving as-is. Terry >>> __field(u64, serial) >>> __field(u32, status) >>> ), >>> TP_fast_assign( >>> - __assign_str(memdev); >>> - __assign_str(host); >>> - __entry->serial = cxlmd->cxlds->serial; >>> + __assign_str(name); >>> + __assign_str(parent); >>> + __entry->serial = serial; >>> __entry->status = status; >>> ), >>> - TP_printk("memdev=%s host=%s serial=%lld: status: '%s'", >>> - __get_str(memdev), __get_str(host), __entry->serial, >>> + TP_printk("device=%s parent=%s serial=%lld status='%s'", >>> + __get_str(name), __get_str(parent), __entry->serial, >>> show_ce_errs(__entry->status) >>> ) >>> ); >>> -- >>> 2.34.1 >> Thanks, >> Shiju