Hi Reinette, On 3/21/25 17:58, Reinette Chatre wrote: > Hi Babu, > > On 1/30/25 1:20 PM, Babu Moger wrote: >> The io_alloc feature in resctrl is a mechanism that enables direct >> insertion of data from I/O devices into the L3 cache. >> >> On AMD systems, io_alloc feature is backed by SDCIAE (L3 Smart Data Cache >> Injection Allocation Enforcement). When enabled, SDCIAE forces all SDCI >> lines to be placed into the L3 cache partitions identified by the >> highest-supported L3_MASK_n register as reported by CPUID >> Fn0000_0010_EDX_x1.MAX_COS. For example, if MAX_COS=15, SDCI lines will >> be allocated into the L3 cache partitions determined by the bitmask in >> the L3_MASK_15 register. >> >> When CDP is enabled, io_alloc routes I/O traffic using the highest CLOSID >> allocated for the instruction cache. > > You can append a "L3CODE" to the above to help provide context on what resource > is referred to as "instruction cache". Sure. > >> >> Introduce interface to enable/disable "io_alloc" feature on user input. >> >> Signed-off-by: Babu Moger <babu.moger@xxxxxxx> >> --- >> v3: Rewrote the change to make it generic. >> Rewrote the documentation in resctrl.rst to be generic and added >> AMD feature details in the end. >> Added the check to verify if MAX CLOSID availability on the system. >> Added CDP check to make sure io_alloc is configured in CDP_CODE. >> Added resctrl_io_alloc_closid_free() to free the io_alloc CLOSID. >> Added errors in few cases when CLOSID allocation fails. >> Fixes splat reported when info/L3/bit_usage is accesed when io_alloc >> is enabled. >> https://lore.kernel.org/lkml/SJ1PR11MB60837B532254E7B23BC27E84FC052@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/ >> >> v2: Renamed the feature to "io_alloc". >> Added generic texts for the feature in commit log and resctrl.rst doc. >> Added resctrl_io_alloc_init_cat() to initialize io_alloc to default >> values when enabled. >> Fixed io_alloc show functinality to display only on L3 resource. >> --- >> Documentation/arch/x86/resctrl.rst | 34 ++++++ >> arch/x86/kernel/cpu/resctrl/core.c | 2 + >> arch/x86/kernel/cpu/resctrl/rdtgroup.c | 144 +++++++++++++++++++++++++ >> 3 files changed, 180 insertions(+) >> >> diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst >> index 6768fc1fad16..1b67e31d626c 100644 >> --- a/Documentation/arch/x86/resctrl.rst >> +++ b/Documentation/arch/x86/resctrl.rst >> @@ -135,6 +135,40 @@ related to allocation: >> "1": >> Non-contiguous 1s value in CBM is supported. >> >> +"io_alloc": >> + The "io_alloc" feature in resctrl enables system software to > > Since this is already resctrl documentation "feature in resctrl" could be dropped to > be just: > "io_alloc" enables system software ... Sure. > > >> + configure the portion of the L3 cache allocated for I/O traffic. >> + By directly caching data from I/O devices rather than first storing >> + the I/O data in DRAM, reduces the demands on DRAM bandwidth and >> + reduces latency to the processor consuming the I/O data. > > hmmm ... looks like "SDCIAE" was deleted from earlier used (marketing?) text and > resulting text left as-is without re-checking if resulting text is still coherent. > I do not think it is needed to motivate/market the feature here, perhaps last > sentence can just be dropped? Yes. I will drop the last sentence. > >> + >> + The feature routes the I/O traffic via specific CLOSID reserved >> + for io_alloc feature. By configuring the CBM (Capacity Bit Mask) >> + for the CLOSID users can control the L3 portions available for >> + I/O traffic. When enabled, CLOSID reserved for the io_alloc will >> + not be available to the resctrl group. > > Although the above reflects how SDCIAE is implemented it may not be true for how > another architecture may support this. hmmm ... this sounds familiar and looking back it > is the same thing I mentioned in V2 feedback, actually, in V2 I pointed to V1 feedback > that said this also. > If you insist on this text then please change the tone that indicates the > behavior is optional. For example, "An architecture may support io_alloc by reserving > a CLOSID to configure the ..." Yes. Sure. > >> + :: >> + >> + # cat /sys/fs/resctrl/info/L3/io_alloc >> + 0 > > Please refer to cover-letter about proposal to use enabled/disabled/not supported instead. Yes. Got it. > >> + >> + "0": >> + io_alloc feature is not enabled. >> + "1": >> + io_alloc feature is enabled, allowing users to manage >> + the portions of the L3 cache allocated for the I/O device. >> + >> + Feature can be enabled/disabled by writing to the interface. >> + Example:: >> + >> + # echo 1 > /sys/fs/resctrl/info/L3/io_alloc >> + >> + On AMD systems, the io_alloc feature is supported by the L3 Smart >> + Data Cache Injection Allocation Enforcement (SDCIAE). The CLOSID for >> + io_alloc is determined by the highest CLOSID supported by the resource. >> + When CDP is enabled, io_alloc routes I/O traffic using the highest >> + CLOSID allocated for the instruction cache. >> + >> Memory bandwidth(MB) subdirectory contains the following files >> with respect to allocation: >> >> diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c >> index 1ebdb2dcc009..88bc95c14ea8 100644 >> --- a/arch/x86/kernel/cpu/resctrl/core.c >> +++ b/arch/x86/kernel/cpu/resctrl/core.c >> @@ -309,6 +309,8 @@ static void rdt_get_cdp_config(int level) >> static void rdt_set_io_alloc_capable(struct rdt_resource *r) >> { >> r->cache.io_alloc_capable = true; >> + resctrl_file_fflags_init("io_alloc", >> + RFTYPE_CTRL_INFO | RFTYPE_RES_CACHE); >> } > > Some MPAM changes landed since you created this work. After the fs/arch split the > architecture code should have no insight into the resctrl file flags. Please refer to > the MPAM changes on how this can be managed. You can refer to thread_throttle_mode_init() > and similar to that resctrl can use the io_alloc_capable flag to make the files visible. Yes. This needs change. > >> >> static void rdt_get_cdp_l3_config(void) >> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c >> index c5a0a31c3a85..37295dd14abe 100644 >> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c >> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c >> @@ -62,6 +62,7 @@ static char last_cmd_status_buf[512]; >> >> static int rdtgroup_setup_root(struct rdt_fs_context *ctx); >> static void rdtgroup_destroy_root(void); >> +static int rdtgroup_init_cat(struct resctrl_schema *s, u32 closid); >> >> struct dentry *debugfs_resctrl; >> >> @@ -180,6 +181,19 @@ void closid_free(int closid) >> __set_bit(closid, &closid_free_map); >> } >> >> +static int resctrl_io_alloc_closid_alloc(u32 io_alloc_closid) >> +{ >> + if (__test_and_clear_bit(io_alloc_closid, &closid_free_map)) >> + return io_alloc_closid; >> + else >> + return -ENOSPC; >> +} >> + >> +static void resctrl_io_alloc_closid_free(u32 io_alloc_closid) >> +{ >> + closid_free(io_alloc_closid); >> +} >> + >> /** >> * closid_allocated - test if provided closid is in use >> * @closid: closid to be tested >> @@ -995,6 +1009,33 @@ static int rdt_shareable_bits_show(struct kernfs_open_file *of, >> return 0; >> } >> >> +/* >> + * io_alloc feature uses max CLOSID to route the IO traffic. >> + * Get the max CLOSID and verify if the CLOSID is available. >> + */ >> +static int resctrl_io_alloc_closid_get(struct rdt_resource *r, >> + struct resctrl_schema *s) >> +{ >> + int num_closids = resctrl_arch_get_num_closid(r); >> + >> + /* >> + * The number of CLOSIDs is determined based on the minimum >> + * supported across all resources (in closid_init). It is stored > > closid_init -> closid_init() Sure. > >> + * in s->num_closids. Also, if CDP is enabled number of CLOSIDs >> + * are halved. To enable io_alloc feature, the number of CLOSIDs >> + * must match the maximum CLOSID supported by the resource. >> + */ >> + if (resctrl_arch_get_cdp_enabled(r->rid)) >> + num_closids /= 2; >> + >> + if (s->num_closid != num_closids) { > > Considering from schemata_list_add(): > s->num_closid = resctrl_arch_get_num_closid(r); > > ... the above "if (s->num_closid != num_closids)" just compares the value to itself, no? > > This function does not actually take all resources into account with the above > comparison. I think what you may need here is a comparison with closid_free_map_len? Yea. I need to change the logic here. The max supported CLOSID on the resource and closid_free_map_len should match. Will fix it. > > As I understand it is still possible to use io_alloc when the resource's max CLOSID > is not within closid_free_map, this is just not done simplify implementation. That is correct. > >> + rdt_last_cmd_puts("Max CLOSID to support io_alloc is not available\n"); >> + return -ENOSPC; >> + } >> + >> + return num_closids - 1; >> +} >> + >> /* >> * rdt_bit_usage_show - Display current usage of resources >> * >> @@ -1038,6 +1079,14 @@ static int rdt_bit_usage_show(struct kernfs_open_file *of, >> for (i = 0; i < closids_supported(); i++) { >> if (!closid_allocated(i)) >> continue; >> + /* >> + * If io_alloc is enabled, the CLOSID will be >> + * allocated but will not be associated with any >> + * groups. Skip in that case. > > This defeats the purpose of "bit_usage" that gives insight to user space > on how the cache is allocated. Instead of ignoring portions of cache > used for I/O this should display to the user that these portions are > used by/shared with hardware. Yes. Will change it. > >> + */ >> + if (i == resctrl_io_alloc_closid_get(r, s) && >> + resctrl_arch_get_io_alloc_enabled(r)) >> + continue; >> ctrl_val = resctrl_arch_get_config(r, dom, i, >> s->conf_type); >> mode = rdtgroup_mode_by_closid(i); >> @@ -1830,6 +1879,94 @@ int resctrl_arch_io_alloc_enable(struct rdt_resource *r, bool enable) >> return 0; >> } >> >> +static int resctrl_io_alloc_show(struct kernfs_open_file *of, >> + struct seq_file *seq, void *v) >> +{ >> + struct resctrl_schema *s = of->kn->parent->priv; >> + struct rdt_resource *r = s->res; >> + >> + seq_printf(seq, "%x\n", resctrl_arch_get_io_alloc_enabled(r)); >> + return 0; >> +} >> + >> +/* >> + * Initialize io_alloc CLOSID cache resource with default CBM values. >> + */ >> +static int resctrl_io_alloc_init_cat(struct rdt_resource *r, >> + struct resctrl_schema *s, u32 closid) >> +{ >> + int ret; >> + >> + rdt_staged_configs_clear(); >> + >> + ret = rdtgroup_init_cat(s, closid); >> + if (ret < 0) >> + goto out_init_cat; >> + >> + ret = resctrl_arch_update_domains(r, closid); >> + >> +out_init_cat: >> + rdt_staged_configs_clear(); >> + return ret; >> +} >> + >> +static ssize_t resctrl_io_alloc_write(struct kernfs_open_file *of, char *buf, >> + size_t nbytes, loff_t off) >> +{ >> + struct resctrl_schema *s = of->kn->parent->priv; >> + struct rdt_resource *r = s->res; >> + u32 io_alloc_closid; >> + bool enable; >> + int ret; >> + >> + if (!r->cache.io_alloc_capable || s->conf_type == CDP_DATA) { >> + rdt_last_cmd_puts("io_alloc feature is not supported on the resource\n"); > > rdt_last_cmd_puts() starts with lockdep_assert_held(&rdtgroup_mutex), also expect > rdt_last_cmd_clear() before first use. Yes. > > >> + return -EINVAL; > > Could ENODEV be used instead? Sure. > >> + } >> + >> + ret = kstrtobool(buf, &enable); >> + if (ret) >> + return ret; >> + >> + cpus_read_lock(); >> + mutex_lock(&rdtgroup_mutex); >> + >> + rdt_last_cmd_clear(); >> + >> + io_alloc_closid = resctrl_io_alloc_closid_get(r, s); >> + if (io_alloc_closid < 0) { > > Could you please add an informative message in last_cmd_status? It may be > possible for user to remedy this and retry. Yes. > >> + ret = -EINVAL; >> + goto out_io_alloc; >> + } >> + >> + if (resctrl_arch_get_io_alloc_enabled(r) != enable) { >> + if (enable) { >> + ret = resctrl_io_alloc_closid_alloc(io_alloc_closid); >> + if (ret < 0) { >> + rdt_last_cmd_puts("CLOSID for io_alloc is not available\n"); > > If the CLOSID is not available then it may be possible for the user to remedy this by > removing a resource group and retry this operation. Since CLOSID is not useful to user space > (and x86 architecture specific) this could be improved to give guidance to user > space about which resource group (by name, not CLOSID) is preventing this from succeeding. Sure. > > (this sounded familiar, looks like I provided the same feedback to V2, to which you > responded "Yes. We can do that.") Yes. Clearly, I missed that. > >> + goto out_io_alloc; >> + } >> + ret = resctrl_io_alloc_init_cat(r, s, io_alloc_closid); >> + if (ret) { >> + rdt_last_cmd_puts("Failed to initialize io_alloc allocations\n"); >> + resctrl_io_alloc_closid_free(io_alloc_closid); >> + goto out_io_alloc; >> + } >> + >> + } else { >> + resctrl_io_alloc_closid_free(io_alloc_closid); >> + } >> + >> + ret = resctrl_arch_io_alloc_enable(r, enable); >> + } >> + >> +out_io_alloc: >> + mutex_unlock(&rdtgroup_mutex); >> + cpus_read_unlock(); >> + >> + return ret ?: nbytes; >> +} >> + >> /* rdtgroup information files for one cache resource. */ >> static struct rftype res_common_files[] = { >> { >> @@ -1982,6 +2119,13 @@ static struct rftype res_common_files[] = { >> .seq_show = rdtgroup_schemata_show, >> .fflags = RFTYPE_CTRL_BASE, >> }, >> + { >> + .name = "io_alloc", >> + .mode = 0644, >> + .kf_ops = &rdtgroup_kf_single_ops, >> + .seq_show = resctrl_io_alloc_show, >> + .write = resctrl_io_alloc_write, >> + }, >> { >> .name = "mba_MBps_event", >> .mode = 0644, > > Reinette > -- Thanks Babu Moger