RE: [PATCH v12 1/2] ACPI:RAS2: Add ACPI RAS2 driver

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



>-----Original Message-----
>From: Borislav Petkov <bp@xxxxxxxxx>
>Sent: 10 September 2025 20:27
>To: Shiju Jose <shiju.jose@xxxxxxxxxx>
>Cc: rafael@xxxxxxxxxx; akpm@xxxxxxxxxxxxxxxxxxxx; rppt@xxxxxxxxxx;
>dferguson@xxxxxxxxxxxxxxxxxxx; linux-edac@xxxxxxxxxxxxxxx; linux-
>acpi@xxxxxxxxxxxxxxx; linux-mm@xxxxxxxxx; linux-doc@xxxxxxxxxxxxxxx;
>tony.luck@xxxxxxxxx; lenb@xxxxxxxxxx; leo.duran@xxxxxxx;
>Yazen.Ghannam@xxxxxxx; mchehab@xxxxxxxxxx; Jonathan Cameron
><jonathan.cameron@xxxxxxxxxx>; Linuxarm <linuxarm@xxxxxxxxxx>;
>rientjes@xxxxxxxxxx; jiaqiyan@xxxxxxxxxx; Jon.Grimm@xxxxxxx;
>dave.hansen@xxxxxxxxxxxxxxx; naoya.horiguchi@xxxxxxx;
>james.morse@xxxxxxx; jthoughton@xxxxxxxxxx; somasundaram.a@xxxxxxx;
>erdemaktas@xxxxxxxxxx; pgonda@xxxxxxxxxx; duenwen@xxxxxxxxxx;
>gthelen@xxxxxxxxxx; wschwartz@xxxxxxxxxxxxxxxxxxx;
>wbs@xxxxxxxxxxxxxxxxxxxxxx; nifan.cxl@xxxxxxxxx; tanxiaofei
><tanxiaofei@xxxxxxxxxx>; Zengtao (B) <prime.zeng@xxxxxxxxxxxxx>; Roberto
>Sassu <roberto.sassu@xxxxxxxxxx>; kangkang.shen@xxxxxxxxxxxxx;
>wanghuiqiang <wanghuiqiang@xxxxxxxxxx>
>Subject: Re: [PATCH v12 1/2] ACPI:RAS2: Add ACPI RAS2 driver
>
>On Tue, Sep 02, 2025 at 06:30:39PM +0100, shiju.jose@xxxxxxxxxx wrote:
>> From: Shiju Jose <shiju.jose@xxxxxxxxxx>
>>
>> Add support for ACPI RAS2 feature table (RAS2) defined in the ACPI 6.5
>> Specification, section 5.2.21.
>> Driver defines RAS2 Init, which extracts the RAS2 table and driver
>

Thanks for reviewing  and for the feedback.

>What is "RAS2 Init"?
>
>$ grep -EriIn "ras2.*init" ras2.mbox
>188:    - Rename ras2_acpi_init() to acpi_ras2_init() and modified to call
>acpi_ras2_init()
>283:Driver defines RAS2 Init, which extracts the RAS2 table and driver
>391:+   acpi_ras2_init();
>410:+ * Driver contains ACPI RAS2 init, which extracts the ACPI RAS2 table and
>774:+void __init acpi_ras2_init(void)
>
>...
>
>I guess you mean acpi_ras2_init().

Yes. 

>
>> adds auxiliary device for each memory feature which binds to the
>> RAS2 memory driver.
>
>But if you mean that, why would you explain what the patch does? That should
>be visible.
>
>What you should explaining is the "why".
Will do.  I will update the patch header.

>
>> Driver uses PCC mailbox to communicate with the ACPI HW and the driver
[...]
>>
>> Thus, the RAS2 driver requires the lowest contiguous physical memory
>> range of the memory associated with a NUMA domain when communicating
>> with the firmware for memory-related features such as scrubbing. The
>> driver uses the component instance ID, as defined in Table 5.80, to
>> retrieve the lowest contiguous physical memory address range within
>> the NUMA node and store it in the struct ras2_context to expose the
>> address range to the userspace and for the communication with the firmware.
>
>This commit message needs serious rewriting and I don't even know what it
>should contain. It is a dump of random text which reads like it is trying to tell me
>something about this new driver but I'm failing to grok what exactly...

I will move these description to the code where it retrieve physical memory range
for the NUMA domain. 
>
>> Co-developed-by: A Somasundaram <somasundaram.a@xxxxxxx>
>> Signed-off-by: A Somasundaram <somasundaram.a@xxxxxxx>
>> Co-developed-by: Jonathan Cameron <Jonathan.Cameron@xxxxxxxxxx>
>> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@xxxxxxxxxx>
>> Tested-by: Daniel Ferguson <danielf@xxxxxxxxxxxxxxxxxxxxxx>
>> Signed-off-by: Shiju Jose <shiju.jose@xxxxxxxxxx>
>> ---
>>  drivers/acpi/Kconfig  |  12 ++
>>  drivers/acpi/Makefile |   1 +
>>  drivers/acpi/bus.c    |   3 +
>>  drivers/acpi/ras2.c   | 389 ++++++++++++++++++++++++++++++++++++++++++
>>  include/acpi/ras2.h   |  63 +++++++
>>  5 files changed, 468 insertions(+)
>>  create mode 100644 drivers/acpi/ras2.c  create mode 100644
>> include/acpi/ras2.h
>>
>> diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig index
>> b594780a57d7..db21bf5a39c7 100644
>> --- a/drivers/acpi/Kconfig
>> +++ b/drivers/acpi/Kconfig
>> @@ -293,6 +293,18 @@ config ACPI_CPPC_LIB
>>  	  If your platform does not support CPPC in firmware,
>>  	  leave this option disabled.
>>
>> +config ACPI_RAS2
>> +	bool "ACPI RAS2 driver"
>> +	select AUXILIARY_BUS
>> +	select MAILBOX
>> +	select PCC
>> +	select NUMA_KEEP_MEMINFO if NUMA_MEMBLKS
>
>Why is this selecting crap instead of depending on the facilities it uses?
Ok. 
>
>> +	help
>> +	  The driver adds support for ACPI RAS2 feature table (extracts RAS2
>> +	  table from OS system table) and OSPM interfaces to send RAS2
>> +	  commands via PCC mailbox subspace. Driver adds platform device for
>> +	  the RAS2 memory features which binds to the RAS2 memory driver.
>
>This help text is useless and is throwing random abbreviations around like it ain't
>no tomorrow.
>
>Also, in all your text: use definite or indefinite articles because it reads weird
>otherwise.
I will change.
>
>> +
[...]
>>
>> diff --git a/drivers/acpi/ras2.c b/drivers/acpi/ras2.c new file mode
>> 100644 index 000000000000..a17eab9eecf1
>> --- /dev/null
>> +++ b/drivers/acpi/ras2.c
>> @@ -0,0 +1,389 @@
>> +// SPDX-License-Identifier: GPL-2.0-only
>> +/*
>> + * Implementation of ACPI RAS2 driver.
>> + *
>> + * Copyright (c) 2024-2025 HiSilicon Limited.
>> + *
>> + * Support for RAS2 - ACPI 6.5 Specification, section 5.2.21
>> + *
>> + * Driver contains ACPI RAS2 init, which extracts the ACPI RAS2 table
>> +and
>> + * get the PCC channel subspace for communicating with the ACPI
>> +compliant
>> + * HW platform which supports ACPI RAS2. Driver adds auxiliary
>> +devices
>> + * for each RAS2 memory feature which binds to the memory ACPI RAS2
>driver.
>
>For whom is this "text" here and what is its purpose?
>
>It'd be more helpful if you at least explained what all those abbreviations meant
>at the top here, so that people can go find out about them if needed.
Will do.
>
>> + */
>> +
>> +#define pr_fmt(fmt) "ACPI RAS2: " fmt
[...]
>> + * Arbitrary retries for PCC commands because the remote processor
>> + * could be much slower to reply. Keeping it high enough to cover
>> + * emulators where the processors run painfully slow.
>> + */
>> +#define RAS2_NUM_RETRIES 600ULL
>
>Why doesn't it have "PCC" in the name then?
Will change.

>
>> +#define RAS2_FEAT_TYPE_MEMORY 0x00
>> +
>> +static int ras2_report_cap_error(u32 cap_status)
>
>All static functions do not need a "ras2_" namespace.
>
>Also, this function is not reporting anything. Function naming *is* important.
Will change.

>
>> +{
>> +	switch (cap_status) {
>> +	case ACPI_RAS2_NOT_VALID:
>> +	case ACPI_RAS2_NOT_SUPPORTED:
>> +		return -EPERM;
>> +	case ACPI_RAS2_BUSY:
>> +		return -EBUSY;
>> +	case ACPI_RAS2_FAILED:
>> +	case ACPI_RAS2_ABORTED:
>> +	case ACPI_RAS2_INVALID_DATA:
>> +		return -EINVAL;
>> +	default: /* 0 or other, Success */
>
>Comments go ontop, not sideways.
>
>And this comment is useless too.
Ok.

>
>> +		return 0;
>> +	}
>> +}
>> +
>> +static int ras2_check_pcc_chan(struct ras2_pcc_subspace
>> +*pcc_subspace)
>
>struct ras2_pcc_subspace *sspc
>
>or so, so that you don't have insane long lines.
>
>> +{
>> +	struct acpi_ras2_shmem __iomem *gen_comm_base = pcc_subspace-
>>comm_addr;
>> +	u32 cap_status, rc;
>> +	u16 status;
>> +
>> +	/*
>> +	 * As per ACPI spec, the PCC space will be initialized by
>> +	 * platform and should have set the command completion bit when
>> +	 * PCC can be used by OSPM.
>> +	 *
>> +	 * Poll PCC status register every 3us(delay_us) for maximum of
>
>delay_us?
>
>> +	 * deadline_us(timeout_us)
>
>timeout_us?
>
>> until PCC command complete bit is set(cond).
>
>cond?
Will update the comment.

>
>> +	 */
>> +	rc = readw_relaxed_poll_timeout(&gen_comm_base->status, status,
>> +					status &
>PCC_STATUS_CMD_COMPLETE, 3,
>> +					pcc_subspace->deadline_us);
>> +	if (rc) {
>
>This function returns an int, you store it into a u32, caller checks for rc < 0...
>
>What a mess. :-(

Will fix.
>
>> +		pr_warn("PCC check channel timeout for pcc_id=%d rc=%d\n",
>> +			pcc_subspace->pcc_id, rc);
>> +		return rc;
>> +	}
>> +
[...]
>> +	/*
>> +	 * Handle the Minimum Request Turnaround Time(MRTT).
>						     ^
>
>In all your text: put a space before the abbreviation in brackets.
>
Will do.

>
>> +	 * "The minimum amount of time that OSPM must wait after the
>completion
[...]
><---- newline here.
Ok.
>
>> +	/*
>> +	 * Retrieve the lowest contiguous physical memory address range within
>> +	 * the NUMA node.
>> +	 */
>
>Why is this requirement here?
The physical memory address range retrieved here for the NUMA domain is used in the subsequent
patch  [PATCH v12 2/2] ras: mem: Add memory ACPI RAS2 driver,
1. to set Requested Address Range(INPUT) field of Table 5.87: Parameter Block Structure for PATROL_SCRUB
when send GET_PATROL_PARAMETERS command to the firmware, to get scrub parameters, running status,
current scrub rate etc.
2. for the validity check of the user requested memory address range to scrub. 
Also intended to expose this supported memory address range to the
userspace via EDAC scrub control interface, though it is not present now.

>
>The commit message is trying to explain it but I'm still none the wiser.
>
>> +	start_pfn = node_start_pfn(ras2_ctx->sys_comp_nid);
>> +	size_pfn = node_spanned_pages(ras2_ctx->sys_comp_nid);
>> +	if (!size_pfn) {
>> +		pr_debug("Failed to find phy addr range for NUMA node(%u)\n",
>> +			 pxm_inst);
>> +		goto ctx_free;
>> +	}
>> +	ras2_ctx->mem_base_addr = __pfn_to_phys(start_pfn);
>> +	ras2_ctx->mem_size = __pfn_to_phys(size_pfn);
>> +
>> +	rc = ras2_register_pcc_channel(ras2_ctx, channel);
>> +	if (rc < 0) {
>> +		pr_debug("Failed to register pcc channel rc=%d\n", rc);
>> +		goto ctx_free;
>> +	}
>> +
>> +	id = ida_alloc(&ras2_ida, GFP_KERNEL);
>> +	if (id < 0) {
>> +		rc = id;
>> +		goto ctx_free;
>
>You just leaked pcc_subspace which ras2_register_pcc_channel() allocated.
Thanks for spotting this. The ras2_pcc_put() was removed in v11 during simplification,
but missed to call  pcc_mbox_free_channel(...) and kfree(pcc_subspace) here.
Will fix.

>
>> +	}
>> +
>> +	ras2_ctx->adev.id		= id;
>> +	ras2_ctx->adev.name		= RAS2_MEM_DEV_ID_NAME;
>> +	ras2_ctx->adev.dev.release	= ras2_release;
>> +	ras2_ctx->adev.dev.parent	= ras2_ctx->dev;
>> +
>> +	rc = auxiliary_device_init(&ras2_ctx->adev);
>> +	if (rc)
>> +		goto ida_free;
>> +
>> +	rc = auxiliary_device_add(&ras2_ctx->adev);
>> +	if (rc) {
>> +		auxiliary_device_uninit(&ras2_ctx->adev);
>> +		return rc;
>
>And this leaked everything you allocated above.
auxiliary_device_uninit() called on failure would trigger  ras2_release(),  which
free the allocated memories etc.
https://elixir.bootlin.com/linux/v6.17-rc5/source/drivers/base/auxiliary.c#L316

>
>> +	}
>> +
>> +	return 0;
>> +
>> +ida_free:
>> +	ida_free(&ras2_ida, id);
>> +ctx_free:
>> +	kfree(ras2_ctx);
>> +
>> +	return rc;
>> +}
>> +
>> +static int acpi_ras2_parse(struct acpi_table_ras2 *ras2_tab) {
>> +	struct acpi_ras2_pcc_desc *pcc_desc_list;
>> +	int rc;
>> +	u16 i;
>> +
>> +	if (ras2_tab->header.length < sizeof(*ras2_tab)) {
>> +		pr_warn(FW_WARN "ACPI RAS2 table present but broken (too
>short
>> +#1)\n");
>
>So dump the sizes to make the error message more helpful.
Sure.

>
>> +		return -EINVAL;
>> +	}
>> +
>> +	if (!ras2_tab->num_pcc_descs) {
>> +		pr_warn(FW_WARN "No PCC descs in ACPI RAS2 table\n");
>> +		return -EINVAL;
>> +	}
>> +
>> +	pcc_desc_list = (struct acpi_ras2_pcc_desc *)(ras2_tab + 1);
>> +	for (i = 0; i < ras2_tab->num_pcc_descs; i++, pcc_desc_list++) {
>> +		if (pcc_desc_list->feature_type != RAS2_FEAT_TYPE_MEMORY)
>> +			continue;
>> +
>> +		rc = ras2_add_aux_device(RAS2_MEM_DEV_ID_NAME,
>> +					 pcc_desc_list->channel_id,
>> +					 pcc_desc_list->instance);
>> +		if (rc)
>> +			pr_warn("Failed to add RAS2 auxiliary device rc=%d\n",
>rc);
>
>What happens with the aux devices you created successfully here? Unwind?
Please see the previous discussions on this were about allowing the successfully created
auxiliary devices to exist.
https://lore.kernel.org/all/20250415210504.GA854098@xxxxxxxxxxxxxxxxx/

>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> +void __init acpi_ras2_init(void)
>> +{
>> +	struct acpi_table_ras2 *ras2_tab;
>> +	acpi_status status;
>> +
>> +	status = acpi_get_table(ACPI_SIG_RAS2, 0,
>> +				(struct acpi_table_header **)&ras2_tab);
>> +	if (ACPI_FAILURE(status)) {
>> +		pr_err("Failed to get table, %s\n",
>acpi_format_exception(status));
>> +		return;
>> +	}
>> +
>> +	if (acpi_ras2_parse(ras2_tab))
>> +		pr_err("Failed to parse RAS2 table\n");
>
>No need for that print if you issue warns above.
Will remove.
>
>> +
>> +	acpi_put_table((struct acpi_table_header *)ras2_tab); }
>
>--
>Regards/Gruss,
>    Boris.
>
>https://people.kernel.org/tglx/notes-about-netiquette


Thanks,
Shiju





[Index of Archives]     [Linux IBM ACPI]     [Linux Power Management]     [Linux Kernel]     [Linux Laptop]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]     [Linux Resources]
  Powered by Linux