On Tue, Sep 02, 2025 at 06:30:39PM +0100, shiju.jose@xxxxxxxxxx wrote: > From: Shiju Jose <shiju.jose@xxxxxxxxxx> > > Add support for ACPI RAS2 feature table (RAS2) defined in the > ACPI 6.5 Specification, section 5.2.21. > Driver defines RAS2 Init, which extracts the RAS2 table and driver What is "RAS2 Init"? $ grep -EriIn "ras2.*init" ras2.mbox 188: - Rename ras2_acpi_init() to acpi_ras2_init() and modified to call acpi_ras2_init() 283:Driver defines RAS2 Init, which extracts the RAS2 table and driver 391:+ acpi_ras2_init(); 410:+ * Driver contains ACPI RAS2 init, which extracts the ACPI RAS2 table and 774:+void __init acpi_ras2_init(void) ... I guess you mean acpi_ras2_init(). > adds auxiliary device for each memory feature which binds to the > RAS2 memory driver. But if you mean that, why would you explain what the patch does? That should be visible. What you should explaining is the "why". > Driver uses PCC mailbox to communicate with the ACPI HW and the > driver adds OSPM interfaces to send RAS2 commands. More useless "what". > According to the ACPI specification rev 6.5, section 5.2.21.1.1 > RAS2 Platform Communication Channel Descriptor, “RAS2 supports multiple > PCC channels, where a channel is dedicated to a given component > instance.” Thus, RAS2 driver has been implemented to support only > systems that comply with the specification, i.e. a dedicated PCC channel > per system component instance for communication. Why is that paragraph here? > ACPI specification rev 6.5, section 5.2.21.1.1 Table 5.80: RAS2 Platform > Communication Channel Descriptor format, defines Field: Instance, > Identifier for the system component instance that the RAS feature is > associated with. More word salad. > Section 5.2.21.2.1 Hardware-based Memory Scrubbing describes as > The platform can use this feature to expose controls and capabilities > associated with hardware-based memory scrub engines. Modern scalable > platforms have complex memory systems with a multitude of memory > controllers that are in turn associated with NUMA domains. It is also > common for RAS errors related to memory to be associated with NUMA > domains, where the NUMA domain functions as a FRU identifier. This > feature thus provides memory scrubbing at a NUMA domain granularity. More word salad. Why is this here at all?! > The following are supported: > 1. Independent memory scrubbing controls for each NUMA domain, identified > using its proximity domain. > 2. Provision for background (patrol) scrubbing of the entire memory > system, as well as on-demand scrubbing for a specific region of memory. > > Thus, the RAS2 driver requires the lowest contiguous physical memory range > of the memory associated with a NUMA domain when communicating with the > firmware for memory-related features such as scrubbing. The driver uses > the component instance ID, as defined in Table 5.80, to retrieve the > lowest contiguous physical memory address range within the NUMA node > and store it in the struct ras2_context to expose the address range to > the userspace and for the communication with the firmware. This commit message needs serious rewriting and I don't even know what it should contain. It is a dump of random text which reads like it is trying to tell me something about this new driver but I'm failing to grok what exactly... > Co-developed-by: A Somasundaram <somasundaram.a@xxxxxxx> > Signed-off-by: A Somasundaram <somasundaram.a@xxxxxxx> > Co-developed-by: Jonathan Cameron <Jonathan.Cameron@xxxxxxxxxx> > Signed-off-by: Jonathan Cameron <Jonathan.Cameron@xxxxxxxxxx> > Tested-by: Daniel Ferguson <danielf@xxxxxxxxxxxxxxxxxxxxxx> > Signed-off-by: Shiju Jose <shiju.jose@xxxxxxxxxx> > --- > drivers/acpi/Kconfig | 12 ++ > drivers/acpi/Makefile | 1 + > drivers/acpi/bus.c | 3 + > drivers/acpi/ras2.c | 389 ++++++++++++++++++++++++++++++++++++++++++ > include/acpi/ras2.h | 63 +++++++ > 5 files changed, 468 insertions(+) > create mode 100644 drivers/acpi/ras2.c > create mode 100644 include/acpi/ras2.h > > diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig > index b594780a57d7..db21bf5a39c7 100644 > --- a/drivers/acpi/Kconfig > +++ b/drivers/acpi/Kconfig > @@ -293,6 +293,18 @@ config ACPI_CPPC_LIB > If your platform does not support CPPC in firmware, > leave this option disabled. > > +config ACPI_RAS2 > + bool "ACPI RAS2 driver" > + select AUXILIARY_BUS > + select MAILBOX > + select PCC > + select NUMA_KEEP_MEMINFO if NUMA_MEMBLKS Why is this selecting crap instead of depending on the facilities it uses? > + help > + The driver adds support for ACPI RAS2 feature table (extracts RAS2 > + table from OS system table) and OSPM interfaces to send RAS2 > + commands via PCC mailbox subspace. Driver adds platform device for > + the RAS2 memory features which binds to the RAS2 memory driver. This help text is useless and is throwing random abbreviations around like it ain't no tomorrow. Also, in all your text: use definite or indefinite articles because it reads weird otherwise. > + > config ACPI_PROCESSOR > tristate "Processor" > depends on X86 || ARM64 || LOONGARCH || RISCV > diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile > index d1b0affb844f..abfec6745724 100644 > --- a/drivers/acpi/Makefile > +++ b/drivers/acpi/Makefile > @@ -105,6 +105,7 @@ obj-$(CONFIG_ACPI_EC_DEBUGFS) += ec_sys.o > obj-$(CONFIG_ACPI_BGRT) += bgrt.o > obj-$(CONFIG_ACPI_CPPC_LIB) += cppc_acpi.o > obj-$(CONFIG_ACPI_SPCR_TABLE) += spcr.o > +obj-$(CONFIG_ACPI_RAS2) += ras2.o > obj-$(CONFIG_ACPI_DEBUGGER_USER) += acpi_dbg.o > obj-$(CONFIG_ACPI_PPTT) += pptt.o > obj-$(CONFIG_ACPI_PFRUT) += pfr_update.o pfr_telemetry.o > diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c > index a984ccd4a2a0..b02ceb2837c6 100644 > --- a/drivers/acpi/bus.c > +++ b/drivers/acpi/bus.c > @@ -31,6 +31,7 @@ > #include <acpi/apei.h> > #include <linux/suspend.h> > #include <linux/prmt.h> > +#include <acpi/ras2.h> > > #include "internal.h" > > @@ -1474,6 +1475,8 @@ static int __init acpi_init(void) > acpi_debugger_init(); > acpi_setup_sb_notify_handler(); > acpi_viot_init(); > + acpi_ras2_init(); > + > return 0; > } > > diff --git a/drivers/acpi/ras2.c b/drivers/acpi/ras2.c > new file mode 100644 > index 000000000000..a17eab9eecf1 > --- /dev/null > +++ b/drivers/acpi/ras2.c > @@ -0,0 +1,389 @@ > +// SPDX-License-Identifier: GPL-2.0-only > +/* > + * Implementation of ACPI RAS2 driver. > + * > + * Copyright (c) 2024-2025 HiSilicon Limited. > + * > + * Support for RAS2 - ACPI 6.5 Specification, section 5.2.21 > + * > + * Driver contains ACPI RAS2 init, which extracts the ACPI RAS2 table and > + * get the PCC channel subspace for communicating with the ACPI compliant > + * HW platform which supports ACPI RAS2. Driver adds auxiliary devices > + * for each RAS2 memory feature which binds to the memory ACPI RAS2 driver. For whom is this "text" here and what is its purpose? It'd be more helpful if you at least explained what all those abbreviations meant at the top here, so that people can go find out about them if needed. > + */ > + > +#define pr_fmt(fmt) "ACPI RAS2: " fmt > + > +#include <linux/delay.h> > +#include <linux/export.h> > +#include <linux/iopoll.h> > +#include <linux/ktime.h> > +#include <acpi/pcc.h> > +#include <acpi/ras2.h> > + > +/** > + * struct ras2_pcc_subspace - Data structure for PCC communication > + * @mbox_client: struct mbox_client object > + * @pcc_chan: Pointer to struct pcc_mbox_chan > + * @comm_addr: Pointer to RAS2 PCC shared memory region > + * @elem: List for registered RAS2 PCC channel subspaces > + * @pcc_lock: PCC lock to provide mutually exclusive access > + * to PCC channel subspace > + * @deadline_us: Poll PCC status register timeout in micro secs > + * for PCC command complete > + * @pcc_mpar: Maximum Periodic Access Rate(MPAR) for PCC channel > + * @pcc_mrtt: Minimum Request Turnaround Time(MRTT) in micro secs > + * OS must wait after completion of a PCC command before > + * issue next command > + * @last_cmd_cmpl_time: completion time of last PCC command > + * @last_mpar_reset: Time of last MPAR count reset > + * @mpar_count: MPAR count > + * @pcc_id: Identifier of the RAS2 platform communication channel > + * @last_cmd: Last PCC command > + * @pcc_chnl_acq: Status of PCC channel acquired > + */ > +struct ras2_pcc_subspace { > + struct mbox_client mbox_client; > + struct pcc_mbox_chan *pcc_chan; > + struct acpi_ras2_shmem __iomem *comm_addr; > + struct list_head elem; > + struct mutex pcc_lock; > + unsigned int deadline_us; > + unsigned int pcc_mpar; > + unsigned int pcc_mrtt; > + ktime_t last_cmd_cmpl_time; > + ktime_t last_mpar_reset; > + int mpar_count; > + int pcc_id; > + u16 last_cmd; > + bool pcc_chnl_acq; > +}; > + > +/* > + * Arbitrary retries for PCC commands because the remote processor > + * could be much slower to reply. Keeping it high enough to cover > + * emulators where the processors run painfully slow. > + */ > +#define RAS2_NUM_RETRIES 600ULL Why doesn't it have "PCC" in the name then? > +#define RAS2_FEAT_TYPE_MEMORY 0x00 > + > +static int ras2_report_cap_error(u32 cap_status) All static functions do not need a "ras2_" namespace. Also, this function is not reporting anything. Function naming *is* important. > +{ > + switch (cap_status) { > + case ACPI_RAS2_NOT_VALID: > + case ACPI_RAS2_NOT_SUPPORTED: > + return -EPERM; > + case ACPI_RAS2_BUSY: > + return -EBUSY; > + case ACPI_RAS2_FAILED: > + case ACPI_RAS2_ABORTED: > + case ACPI_RAS2_INVALID_DATA: > + return -EINVAL; > + default: /* 0 or other, Success */ Comments go ontop, not sideways. And this comment is useless too. > + return 0; > + } > +} > + > +static int ras2_check_pcc_chan(struct ras2_pcc_subspace *pcc_subspace) struct ras2_pcc_subspace *sspc or so, so that you don't have insane long lines. > +{ > + struct acpi_ras2_shmem __iomem *gen_comm_base = pcc_subspace->comm_addr; > + u32 cap_status, rc; > + u16 status; > + > + /* > + * As per ACPI spec, the PCC space will be initialized by > + * platform and should have set the command completion bit when > + * PCC can be used by OSPM. > + * > + * Poll PCC status register every 3us(delay_us) for maximum of delay_us? > + * deadline_us(timeout_us) timeout_us? > until PCC command complete bit is set(cond). cond? > + */ > + rc = readw_relaxed_poll_timeout(&gen_comm_base->status, status, > + status & PCC_STATUS_CMD_COMPLETE, 3, > + pcc_subspace->deadline_us); > + if (rc) { This function returns an int, you store it into a u32, caller checks for rc < 0... What a mess. :-( > + pr_warn("PCC check channel timeout for pcc_id=%d rc=%d\n", > + pcc_subspace->pcc_id, rc); > + return rc; > + } > + > + if (status & PCC_STATUS_ERROR) { > + pr_warn("Error in executing last command=%d for pcc_id=%d\n", > + pcc_subspace->last_cmd, pcc_subspace->pcc_id); > + status &= ~PCC_STATUS_ERROR; > + writew_relaxed(status, &gen_comm_base->status); > + return -EIO; > + } > + > + cap_status = readw_relaxed(&gen_comm_base->set_caps_status); > + writew_relaxed(0x0, &gen_comm_base->set_caps_status); > + return ras2_report_cap_error(cap_status); > +} > + > +/** > + * ras2_send_pcc_cmd() - Send RAS2 command via PCC channel > + * @ras2_ctx: pointer to the RAS2 context structure > + * @cmd: command to send > + * > + * Returns: 0 on success, an error otherwise > + */ > +int ras2_send_pcc_cmd(struct ras2_mem_ctx *ras2_ctx, u16 cmd) > +{ > + struct ras2_pcc_subspace *pcc_subspace = ras2_ctx->pcc_subspace; > + struct acpi_ras2_shmem __iomem *gen_comm_base = pcc_subspace->comm_addr; > + struct mbox_chan *pcc_channel; > + unsigned int time_delta; > + int rc; > + > + rc = ras2_check_pcc_chan(pcc_subspace); > + if (rc < 0) > + return rc; > + > + pcc_channel = pcc_subspace->pcc_chan->mchan; > + > + /* > + * Handle the Minimum Request Turnaround Time(MRTT). ^ In all your text: put a space before the abbreviation in brackets. > + * "The minimum amount of time that OSPM must wait after the completion > + * of a command before issuing the next command, in microseconds." > + */ > + if (pcc_subspace->pcc_mrtt) { > + time_delta = ktime_us_delta(ktime_get(), > + pcc_subspace->last_cmd_cmpl_time); > + if (pcc_subspace->pcc_mrtt > time_delta) > + udelay(pcc_subspace->pcc_mrtt - time_delta); > + } > + > + /* > + * Handle the non-zero Maximum Periodic Access Rate(MPAR). > + * "The maximum number of periodic requests that the subspace channel can > + * support, reported in commands per minute. 0 indicates no limitation." > + * > + * This parameter should be ideally zero or large enough so that it can > + * handle maximum number of requests that all the cores in the system can > + * collectively generate. If it is not, we will follow the spec and just s/we will// > + * not send the request to the platform after hitting the MPAR limit in > + * any 60s window. > + */ > + if (pcc_subspace->pcc_mpar) { > + if (pcc_subspace->mpar_count == 0) { > + time_delta = ktime_ms_delta(ktime_get(), > + pcc_subspace->last_mpar_reset); > + if (time_delta < 60 * MSEC_PER_SEC) { > + dev_dbg(ras2_ctx->dev, > + "PCC cmd not sent due to MPAR limit"); you can dump which cmd here. > + return -EIO; > + } > + pcc_subspace->last_mpar_reset = ktime_get(); > + pcc_subspace->mpar_count = pcc_subspace->pcc_mpar; > + } > + pcc_subspace->mpar_count--; > + } > + > + /* Write to the shared comm region */ > + writew_relaxed(cmd, &gen_comm_base->command); > + > + /* Flip CMD COMPLETE bit */ > + writew_relaxed(0, &gen_comm_base->status); > + > + /* Ring doorbell */ > + rc = mbox_send_message(pcc_channel, &cmd); > + if (rc < 0) { > + dev_warn(ras2_ctx->dev, > + "Err sending PCC mbox message. cmd:%d, rc:%d\n", > + cmd, rc); > + return rc; > + } > + > + pcc_subspace->last_cmd = cmd; > + > + /* > + * If Minimum Request Turnaround Time is non-zero, we need > + * to record the completion time of both READ and WRITE s/we need to// You get the idea... > + * command for proper handling of MRTT, so we need to check s/we need to// > + * for pcc_mrtt in addition to CMD_READ. > + */ > + if (cmd == PCC_CMD_EXEC_RAS2 || pcc_subspace->pcc_mrtt) { > + rc = ras2_check_pcc_chan(pcc_subspace); > + if (pcc_subspace->pcc_mrtt) > + pcc_subspace->last_cmd_cmpl_time = ktime_get(); > + } > + > + if (pcc_channel->mbox->txdone_irq) > + mbox_chan_txdone(pcc_channel, rc); > + else > + mbox_client_txdone(pcc_channel, rc); > + > + return rc < 0 ? rc : 0; > +} > +EXPORT_SYMBOL_GPL(ras2_send_pcc_cmd); > + > +static void ras2_list_pcc_release(struct ras2_pcc_subspace *pcc_subspace) > +{ > + pcc_mbox_free_channel(pcc_subspace->pcc_chan); > + kfree(pcc_subspace); > +} > + > +static int ras2_register_pcc_channel(struct ras2_mem_ctx *ras2_ctx, int pcc_id) > +{ > + struct ras2_pcc_subspace *pcc_subspace; > + struct pcc_mbox_chan *pcc_chan; > + struct mbox_client *mbox_cl; > + > + if (pcc_id < 0) > + return -EINVAL; > + > + pcc_subspace = kzalloc(sizeof(*pcc_subspace), GFP_KERNEL); > + if (!pcc_subspace) > + return -ENOMEM; > + > + mbox_cl = &pcc_subspace->mbox_client; > + mbox_cl->knows_txdone = true; > + > + pcc_chan = pcc_mbox_request_channel(mbox_cl, pcc_id); > + if (IS_ERR(pcc_chan)) { > + kfree(pcc_subspace); > + return PTR_ERR(pcc_chan); > + } > + > + pcc_subspace->pcc_id = pcc_id; > + pcc_subspace->pcc_chan = pcc_chan; > + pcc_subspace->comm_addr = pcc_chan->shmem; > + pcc_subspace->deadline_us = RAS2_NUM_RETRIES * pcc_chan->latency; > + pcc_subspace->pcc_mrtt = pcc_chan->min_turnaround_time; > + pcc_subspace->pcc_mpar = pcc_chan->max_access_rate; > + pcc_subspace->mbox_client.knows_txdone = true; > + pcc_subspace->pcc_chnl_acq = true; > + > + ras2_ctx->pcc_subspace = pcc_subspace; > + ras2_ctx->comm_addr = pcc_subspace->comm_addr; > + ras2_ctx->dev = pcc_chan->mchan->mbox->dev; > + > + mutex_init(&pcc_subspace->pcc_lock); > + ras2_ctx->pcc_lock = &pcc_subspace->pcc_lock; > + > + return 0; > +} > + > +static DEFINE_IDA(ras2_ida); > +static void ras2_release(struct device *device) > +{ > + struct auxiliary_device *auxdev = to_auxiliary_dev(device); > + struct ras2_mem_ctx *ras2_ctx = > + container_of(auxdev, struct ras2_mem_ctx, adev); > + > + ida_free(&ras2_ida, auxdev->id); > + ras2_list_pcc_release(ras2_ctx->pcc_subspace); > + kfree(ras2_ctx); > +} > + > +static int ras2_add_aux_device(char *name, int channel, u32 pxm_inst) > +{ > + unsigned long start_pfn, size_pfn; > + struct ras2_mem_ctx *ras2_ctx; > + int id, rc; > + > + ras2_ctx = kzalloc(sizeof(*ras2_ctx), GFP_KERNEL); > + if (!ras2_ctx) > + return -ENOMEM; > + > + ras2_ctx->sys_comp_nid = pxm_to_node(pxm_inst); <---- newline here. > + /* > + * Retrieve the lowest contiguous physical memory address range within > + * the NUMA node. > + */ Why is this requirement here? The commit message is trying to explain it but I'm still none the wiser. > + start_pfn = node_start_pfn(ras2_ctx->sys_comp_nid); > + size_pfn = node_spanned_pages(ras2_ctx->sys_comp_nid); > + if (!size_pfn) { > + pr_debug("Failed to find phy addr range for NUMA node(%u)\n", > + pxm_inst); > + goto ctx_free; > + } > + ras2_ctx->mem_base_addr = __pfn_to_phys(start_pfn); > + ras2_ctx->mem_size = __pfn_to_phys(size_pfn); > + > + rc = ras2_register_pcc_channel(ras2_ctx, channel); > + if (rc < 0) { > + pr_debug("Failed to register pcc channel rc=%d\n", rc); > + goto ctx_free; > + } > + > + id = ida_alloc(&ras2_ida, GFP_KERNEL); > + if (id < 0) { > + rc = id; > + goto ctx_free; You just leaked pcc_subspace which ras2_register_pcc_channel() allocated. > + } > + > + ras2_ctx->adev.id = id; > + ras2_ctx->adev.name = RAS2_MEM_DEV_ID_NAME; > + ras2_ctx->adev.dev.release = ras2_release; > + ras2_ctx->adev.dev.parent = ras2_ctx->dev; > + > + rc = auxiliary_device_init(&ras2_ctx->adev); > + if (rc) > + goto ida_free; > + > + rc = auxiliary_device_add(&ras2_ctx->adev); > + if (rc) { > + auxiliary_device_uninit(&ras2_ctx->adev); > + return rc; And this leaked everything you allocated above. > + } > + > + return 0; > + > +ida_free: > + ida_free(&ras2_ida, id); > +ctx_free: > + kfree(ras2_ctx); > + > + return rc; > +} > + > +static int acpi_ras2_parse(struct acpi_table_ras2 *ras2_tab) > +{ > + struct acpi_ras2_pcc_desc *pcc_desc_list; > + int rc; > + u16 i; > + > + if (ras2_tab->header.length < sizeof(*ras2_tab)) { > + pr_warn(FW_WARN "ACPI RAS2 table present but broken (too short #1)\n"); So dump the sizes to make the error message more helpful. > + return -EINVAL; > + } > + > + if (!ras2_tab->num_pcc_descs) { > + pr_warn(FW_WARN "No PCC descs in ACPI RAS2 table\n"); > + return -EINVAL; > + } > + > + pcc_desc_list = (struct acpi_ras2_pcc_desc *)(ras2_tab + 1); > + for (i = 0; i < ras2_tab->num_pcc_descs; i++, pcc_desc_list++) { > + if (pcc_desc_list->feature_type != RAS2_FEAT_TYPE_MEMORY) > + continue; > + > + rc = ras2_add_aux_device(RAS2_MEM_DEV_ID_NAME, > + pcc_desc_list->channel_id, > + pcc_desc_list->instance); > + if (rc) > + pr_warn("Failed to add RAS2 auxiliary device rc=%d\n", rc); What happens with the aux devices you created successfully here? Unwind? > + } > + > + return 0; > +} > + > +void __init acpi_ras2_init(void) > +{ > + struct acpi_table_ras2 *ras2_tab; > + acpi_status status; > + > + status = acpi_get_table(ACPI_SIG_RAS2, 0, > + (struct acpi_table_header **)&ras2_tab); > + if (ACPI_FAILURE(status)) { > + pr_err("Failed to get table, %s\n", acpi_format_exception(status)); > + return; > + } > + > + if (acpi_ras2_parse(ras2_tab)) > + pr_err("Failed to parse RAS2 table\n"); No need for that print if you issue warns above. > + > + acpi_put_table((struct acpi_table_header *)ras2_tab); > +} -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette