Re: [PATCH v9 10/16] cxl/pci: Unify CXL trace logging for CXL Endpoints and CXL Ports

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 6/6/2025 9:41 AM, Bowman, Terry wrote:
>
> On 6/6/2025 4:08 AM, Shiju Jose wrote:
>>> -----Original Message-----
>>> From: Terry Bowman <terry.bowman@xxxxxxx>
>>> Sent: 03 June 2025 18:23
>>> To: PradeepVineshReddy.Kodamati@xxxxxxx; dave@xxxxxxxxxxxx; Jonathan
>>> Cameron <jonathan.cameron@xxxxxxxxxx>; dave.jiang@xxxxxxxxx;
>>> alison.schofield@xxxxxxxxx; vishal.l.verma@xxxxxxxxx; ira.weiny@xxxxxxxxx;
>>> dan.j.williams@xxxxxxxxx; bhelgaas@xxxxxxxxxx; bp@xxxxxxxxx;
>>> ming.li@xxxxxxxxxxxx; Shiju Jose <shiju.jose@xxxxxxxxxx>;
>>> dan.carpenter@xxxxxxxxxx; Smita.KoralahalliChannabasappa@xxxxxxx;
>>> kobayashi.da-06@xxxxxxxxxxx; terry.bowman@xxxxxxx; yanfei.xu@xxxxxxxxx;
>>> rrichter@xxxxxxx; peterz@xxxxxxxxxxxxx; colyli@xxxxxxx;
>>> uaisheng.ye@xxxxxxxxx; fabio.m.de.francesco@xxxxxxxxxxxxxxx;
>>> ilpo.jarvinen@xxxxxxxxxxxxxxx; yazen.ghannam@xxxxxxx; linux-
>>> cxl@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; linux-pci@xxxxxxxxxxxxxxx
>>> Subject: [PATCH v9 10/16] cxl/pci: Unify CXL trace logging for CXL Endpoints and
>>> CXL Ports
>>>
>>> CXL currently has separate trace routines for CXL Port errors and CXL Endpoint
>>> errors. This is inconvenient for the user because they must enable
>>> 2 sets of trace routines. Make updates to the trace logging such that a single
>>> trace routine logs both CXL Endpoint and CXL Port protocol errors.
>>>
>>> Rename the 'host' field from the CXL Endpoint trace to 'parent' in the unified
>>> trace routines. 'host' does not correctly apply to CXL Port devices. Parent is more
>>> general and applies to CXL Port devices and CXL Endpoints.
>>>
>>> Add serial number parameter to the trace logging. This is used for EPs and 0 is
>>> provided for CXL port devices without a serial number.
>>>
>>> Below is output of correctable and uncorrectable protocol error logging.
>>> CXL Root Port and CXL Endpoint examples are included below.
>>>
>>> Root Port:
>>> cxl_aer_correctable_error: device=0000:0c:00.0 parent=pci0000:0c serial: 0
>>> status='CRC Threshold Hit'
>>> cxl_aer_uncorrectable_error: device=0000:0c:00.0 parent=pci0000:0c serial: 0
>>> status: 'Cache Byte Enable Parity Error' first_error: 'Cache Byte Enable Parity
>>> Error'
>>>
>>> Endpoint:
>>> cxl_aer_correctable_error: device=mem3 parent=0000:0f:00.0 serial=0
>>> status='CRC Threshold Hit'
>>> cxl_aer_uncorrectable_error: device=mem3 parent=0000:0f:00.0 serial: 0
>>> status: 'Cache Byte Enable Parity Error' first_error: 'Cache Byte Enable Parity
>>> Error'
>>>
>>> Signed-off-by: Terry Bowman <terry.bowman@xxxxxxx>
>>> ---
>>> drivers/cxl/core/pci.c   | 18 +++++----
>>> drivers/cxl/core/ras.c   | 14 ++++---
>>> drivers/cxl/core/trace.h | 84 +++++++++-------------------------------
>>> 3 files changed, 37 insertions(+), 79 deletions(-)
>>>
>>> diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c index
>>> 186a5a20b951..0f4c07fd64a5 100644
>>> --- a/drivers/cxl/core/pci.c
>>> +++ b/drivers/cxl/core/pci.c
>>> @@ -664,7 +664,7 @@ void read_cdat_data(struct cxl_port *port)  }
>>> EXPORT_SYMBOL_NS_GPL(read_cdat_data, "CXL");
>>>
>> [...]
>>> static void cxl_cper_handle_prot_err(struct cxl_cper_prot_err_work_data
>>> *data) diff --git a/drivers/cxl/core/trace.h b/drivers/cxl/core/trace.h index
>>> 25ebfbc1616c..8c91b0f3d165 100644
>>> --- a/drivers/cxl/core/trace.h
>>> +++ b/drivers/cxl/core/trace.h
>>> @@ -48,49 +48,22 @@
>>> 	{ CXL_RAS_UC_IDE_RX_ERR, "IDE Rx Error" }			  \
>>> )
>>>
>>> -TRACE_EVENT(cxl_port_aer_uncorrectable_error,
>>> -	TP_PROTO(struct device *dev, u32 status, u32 fe, u32 *hl),
>>> -	TP_ARGS(dev, status, fe, hl),
>>> -	TP_STRUCT__entry(
>>> -		__string(device, dev_name(dev))
>>> -		__string(host, dev_name(dev->parent))
>>> -		__field(u32, status)
>>> -		__field(u32, first_error)
>>> -		__array(u32, header_log, CXL_HEADERLOG_SIZE_U32)
>>> -	),
>>> -	TP_fast_assign(
>>> -		__assign_str(device);
>>> -		__assign_str(host);
>>> -		__entry->status = status;
>>> -		__entry->first_error = fe;
>>> -		/*
>>> -		 * Embed the 512B headerlog data for user app retrieval and
>>> -		 * parsing, but no need to print this in the trace buffer.
>>> -		 */
>>> -		memcpy(__entry->header_log, hl, CXL_HEADERLOG_SIZE);
>>> -	),
>>> -	TP_printk("device=%s host=%s status: '%s' first_error: '%s'",
>>> -		  __get_str(device), __get_str(host),
>>> -		  show_uc_errs(__entry->status),
>>> -		  show_uc_errs(__entry->first_error)
>>> -	)
>>> -);
>>> -
>>> TRACE_EVENT(cxl_aer_uncorrectable_error,
>>> -	TP_PROTO(const struct cxl_memdev *cxlmd, u32 status, u32 fe, u32
>>> *hl),
>>> -	TP_ARGS(cxlmd, status, fe, hl),
>>> +	TP_PROTO(struct device *dev, u64 serial, u32 status, u32 fe,
>>> +		 u32 *hl),
>>> +	TP_ARGS(dev, serial, status, fe, hl),
>>> 	TP_STRUCT__entry(
>>> -		__string(memdev, dev_name(&cxlmd->dev))
>>> -		__string(host, dev_name(cxlmd->dev.parent))
>>> +		__string(name, dev_name(dev))
>>> +		__string(parent, dev_name(dev->parent))
>> Hi Terry,
>>
>> As we pointed out in v8, renaming the fields "memdev" to "name" and "host" to "parent"
>> causes issues and failures in userspace rasdaemon  while parsing the trace event data.
>> Additionally, we can't rename these fields in rasdaemon  due to backward compatibility.
> Yes, I remember but didn't understand why other SW couldn't be updated to handle. I will
> change as you request but many people will be confused why a port device's name is labeled
> as a memdev. memdev is only correct for EPs and does not correctly reflect *any* of the
> other CXL device types (RP, USP, DSP).
>
>>> 		__field(u64, serial)
>>> 		__field(u32, status)
>>> 		__field(u32, first_error)
>>> 		__array(u32, header_log, CXL_HEADERLOG_SIZE_U32)
>>> 	),
>>> 	TP_fast_assign(
>>> -		__assign_str(memdev);
>>> -		__assign_str(host);
>>> -		__entry->serial = cxlmd->cxlds->serial;
>>> +		__assign_str(name);
>>> +		__assign_str(parent);
>>> +		__entry->serial = serial;
>>> 		__entry->status = status;
>>> 		__entry->first_error = fe;
>>> 		/*
>>> @@ -99,8 +72,8 @@ TRACE_EVENT(cxl_aer_uncorrectable_error,
>>> 		 */
>>> 		memcpy(__entry->header_log, hl, CXL_HEADERLOG_SIZE);
>>> 	),
>>> -	TP_printk("memdev=%s host=%s serial=%lld: status: '%s' first_error:
>>> '%s'",
>>> -		  __get_str(memdev), __get_str(host), __entry->serial,
>>> +	TP_printk("device=%s parent=%s serial=%lld status='%s'
>>> first_error='%s'",
>>> +		  __get_str(name), __get_str(parent), __entry->serial,
>>> 		  show_uc_errs(__entry->status),
>>> 		  show_uc_errs(__entry->first_error)
>>> 	)
>>> @@ -124,42 +97,23 @@ TRACE_EVENT(cxl_aer_uncorrectable_error,
>>> 	{ CXL_RAS_CE_PHYS_LAYER_ERR, "Received Error From Physical Layer"
>>> }	\
>>> )
>>>
>>> -TRACE_EVENT(cxl_port_aer_correctable_error,
>>> -	TP_PROTO(struct device *dev, u32 status),
>>> -	TP_ARGS(dev, status),
>>> -	TP_STRUCT__entry(
>>> -		__string(device, dev_name(dev))
>>> -		__string(host, dev_name(dev->parent))
>>> -		__field(u32, status)
>>> -	),
>>> -	TP_fast_assign(
>>> -		__assign_str(device);
>>> -		__assign_str(host);
>>> -		__entry->status = status;
>>> -	),
>>> -	TP_printk("device=%s host=%s status='%s'",
>>> -		  __get_str(device), __get_str(host),
>>> -		  show_ce_errs(__entry->status)
>>> -	)
>>> -);
>>> -
>>> TRACE_EVENT(cxl_aer_correctable_error,
>>> -	TP_PROTO(const struct cxl_memdev *cxlmd, u32 status),
>>> -	TP_ARGS(cxlmd, status),
>>> +	TP_PROTO(struct device *dev, u64 serial, u32 status),
>>> +	TP_ARGS(dev, serial, status),
>>> 	TP_STRUCT__entry(
>>> -		__string(memdev, dev_name(&cxlmd->dev))
>>> -		__string(host, dev_name(cxlmd->dev.parent))
>>> +		__string(name, dev_name(dev))
>>> +		__string(parent, dev_name(dev->parent))
>> Renaming these fields is an issue for userspace as mentioned above 
>> in cxl_aer_uncorrectable_error.
> I understand, I'll revert as you request.
>
> Terry

I'll update the commit message with explanation for leaving as-is.

Terry
>>> 		__field(u64, serial)
>>> 		__field(u32, status)
>>> 	),
>>> 	TP_fast_assign(
>>> -		__assign_str(memdev);
>>> -		__assign_str(host);
>>> -		__entry->serial = cxlmd->cxlds->serial;
>>> +		__assign_str(name);
>>> +		__assign_str(parent);
>>> +		__entry->serial = serial;
>>> 		__entry->status = status;
>>> 	),
>>> -	TP_printk("memdev=%s host=%s serial=%lld: status: '%s'",
>>> -		  __get_str(memdev), __get_str(host), __entry->serial,
>>> +	TP_printk("device=%s parent=%s serial=%lld status='%s'",
>>> +		  __get_str(name), __get_str(parent), __entry->serial,
>>> 		  show_ce_errs(__entry->status)
>>> 	)
>>> );
>>> --
>>> 2.34.1
>> Thanks,
>> Shiju





[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux