Hi Reinette, On 6/17/25 22:59, Reinette Chatre wrote: > Hi Babu, > > On 6/11/25 2:23 PM, Babu Moger wrote: >> The io_alloc feature in resctrl is a mechanism that enables direct > > "The ... feature ... is a mechanism"? What does it mean when a feature > is a mechanism? How about just "The io_alloc feature in resctrl enables ..."? Sure. > >> insertion of data from I/O devices into the L3 cache. > > Drop "L3"? > Sure. >> >> On AMD systems, io_alloc feature is backed by SDCIAE (L3 Smart Data Cache >> Injection Allocation Enforcement). When enabled, SDCIAE forces all SDCI >> lines to be placed into the L3 cache partitions identified by the >> highest-supported L3_MASK_n register as reported by CPUID >> Fn0000_0010_EDX_x1.MAX_COS. For example, if MAX_COS=15, SDCI lines will >> be allocated into the L3 cache partitions determined by the bitmask in >> the L3_MASK_15 register. > > This is a resctrl fs patch but most of changelog so far is about AMD's implementation > details. I do think it is relevant, but all the register details can > probably be dropped since it is too low level to support the goal. I believe the > goal here is to motivate resctrl fs's implementation that needs to pick highest > CLOSID. So, essentially the changelog needs to hightlight that AMD's implementation > requires highest CLOSID and without any other reference that is what resctrl fs's > implementation supports. I think it is important to highlight that the > user interface is not tied to this implementation decision to avoid future issues > if another architecture support "io_alloc" "differently". Internals of > resctrl fs can then be changed. > Sure. Will split these patches. Will try make the text more generic. >> >> When CDP is enabled, io_alloc routes I/O traffic using the highest CLOSID >> allocated for the instruction cache (L3CODE). > > Again, this is a resctrl fs patch and above is an AMD implementation detail. resctrl > fs should not be limited by how AMD implements the feature but can use it as first > reference. Sure. > >> >> Introduce user interface to enable/disable "io_alloc" feature. > > This patch does more than this. Sure. > >> >> Signed-off-by: Babu Moger <babu.moger@xxxxxxx> >> --- > ... > >> --- >> Documentation/filesystems/resctrl.rst | 34 ++++ >> fs/resctrl/rdtgroup.c | 216 +++++++++++++++++++++++++- > > This patch contains several logical changes that are not adequately described > in changelog. > I think the following can be separate patches: > - rdt_bit_usage_show() change > - io_alloc_init() definition and usage > - show() and write() helpers > - possibly the io_alloc_closid helpers (more later) Yes. Splitting this into 3 patches. 1. rdt_bit_usage_show() change Updates shareable_bits section. It calls - resctrl_io_alloc_closid(). 2. io_alloc_init() definition and usage Introdce the inteface and add show() 3. Add io_alloc write(). Calls resctrl_io_alloc_closid_supported() and resctrl_io_alloc_closid_get(). > >> 2 files changed, 248 insertions(+), 2 deletions(-) >> >> diff --git a/Documentation/filesystems/resctrl.rst b/Documentation/filesystems/resctrl.rst >> index c7949dd44f2f..03c829b2c276 100644 >> --- a/Documentation/filesystems/resctrl.rst >> +++ b/Documentation/filesystems/resctrl.rst >> @@ -95,6 +95,11 @@ related to allocation: >> some platforms support devices that have their >> own settings for cache use which can over-ride >> these bits. >> + >> + When the "io_alloc" feature is enabled, a portion of the cache >> + is reserved for shared use between hardware and software. Refer > > "reserved" and "shared" sounds like a contradiction. How about "is reserved" -> > "can be configured"? > >> + to "bit_usage" to see which portion is allocated for this purpose. > > This is the "shareable_bits" section and the reason why user is pointed to > "bit_usage" is because "shareable_bits" is a global CBM that applies to all > cache domains/instances while "bit_usage" is per cache domain/instance. I think > it will be helpful to point this out to the user. > Perhaps second sentence can be replaced with something like: > ""bit_usage" should be used to see which portions of each cache instance > is configured for hardware use via the "io_alloc" feature because every cache > instance can have its "io_alloc" bitmask configured independently". Sure. ""bit_usage" should be used to see which portions of each cache instance is configured for hardware use via the "io_alloc" feature because every cache instance can have its "io_alloc" bitmask configured independently. > > Please do improve this. > > To complete this the first part of the "shareable_bits" doc can be amended: > "Bitmask of shareable resource" -> "Bitmask (applicable to all instances of this resource) of shareable resource" > What do you think? Sure. Added one sentance for that. Bitmask of shareable resource with other executing entities (e.g. I/O). Applies to all instances of this resource. > >> + >> "bit_usage": >> Annotated capacity bitmasks showing how all >> instances of the resource are used. The legend is: >> @@ -135,6 +140,35 @@ related to allocation: >> "1": >> Non-contiguous 1s value in CBM is supported. >> >> +"io_alloc": >> + The "io_alloc" enables system software to configure the portion > > "The "io_alloc" enables"? Maybe just ""io_alloc" enables"? > >> + of the L3 cache allocated for I/O traffic. > > Drop "L3"? > > Can append, for example, "File may only exist if the system supports this feature on some of its cache > resources". > >> + >> + The feature routes the I/O traffic via specific CLOSID reserved >> + for io_alloc feature. By configuring the CBM (Capacity Bit Mask) >> + for the CLOSID, users can control the L3 portions available for >> + I/0 traffic. The reserved CLOSID will be excluded for group creation. > > Looking back I've commented *four* times already about how resctrl fs user interface > should not be made specific to AMD's implementation. > This paragraph just be dropped? > > > The rest can be something like: > > "disabled": Portions of cache used for allocation of I/O traffic cannot be configured. > "enabled": Portions of cache used for allocation of I/O traffic can be configured using "io_alloc_cbm" > "not supported": ... > > The underlying implementation may reduce resources available to > general (CPU) cache allocation. See architecture specific notes below. > Depending on usage requirements the feature can be enabled or disabled: > > To enable: > # echo 1 > /sys/fs/resctrl/info/L3/io_alloc > > To disable: > # echo 0 > /sys/fs/resctrl/info/L3/io_alloc Sure. > >> + >> + The interface provides a means to query the status of the feature. >> + >> + Example:: >> + >> + # cat /sys/fs/resctrl/info/L3/io_alloc >> + disabled >> + >> + Feature can be enabled/disabled by writing to the interface. >> + Example:: >> + >> + # echo 1 > /sys/fs/resctrl/info/L3/io_alloc >> + # cat /sys/fs/resctrl/info/L3/io_alloc >> + enabled >> + >> + On AMD systems, the io_alloc feature is supported by the L3 Smart >> + Data Cache Injection Allocation Enforcement (SDCIAE). The CLOSID for >> + io_alloc is determined by the highest CLOSID supported by the resource. >> + When CDP is enabled, io_alloc routes I/O traffic using the highest >> + CLOSID allocated for the instruction cache (L3CODE). >> + >> Memory bandwidth(MB) subdirectory contains the following files >> with respect to allocation: >> >> diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c >> index 1beb124e25f6..bbc032b4d0e9 100644 >> --- a/fs/resctrl/rdtgroup.c >> +++ b/fs/resctrl/rdtgroup.c >> @@ -70,6 +70,7 @@ static struct seq_buf last_cmd_status; >> static char last_cmd_status_buf[512]; >> >> static int rdtgroup_setup_root(struct rdt_fs_context *ctx); >> +static int rdtgroup_init_cat(struct resctrl_schema *s, u32 closid); >> >> static void rdtgroup_destroy_root(void); >> >> @@ -232,6 +233,19 @@ bool closid_allocated(unsigned int closid) >> return !test_bit(closid, closid_free_map); >> } >> >> +static int resctrl_io_alloc_closid_alloc(u32 io_alloc_closid) >> +{ >> + if (__test_and_clear_bit(io_alloc_closid, closid_free_map)) >> + return io_alloc_closid; >> + else >> + return -ENOSPC; >> +} >> + >> +static void resctrl_io_alloc_closid_free(u32 io_alloc_closid) >> +{ >> + closid_free(io_alloc_closid); >> +} >> + >> /** >> * rdtgroup_mode_by_closid - Return mode of resource group with closid >> * @closid: closid if the resource group >> @@ -1030,6 +1044,29 @@ static int rdt_shareable_bits_show(struct kernfs_open_file *of, >> return 0; >> } >> >> +/* >> + * resctrl_io_alloc_closid_get - io_alloc feature uses max CLOSID to route >> + * the IO traffic. Get the max CLOSID and verify if the CLOSID is available. > > The function name should be followed by a *brief* description. Sure. > >> + * >> + * The total number of CLOSIDs is determined in closid_init(), based on the >> + * minimum supported across all resources. If CDP (Code Data Prioritization) >> + * is enabled, the number of CLOSIDs is halved. The final value is returned >> + * by closids_supported(). Make sure this value aligns with the maximum >> + * CLOSID supported by the respective resource. > > All but the last sentence is unrelated to this function and the last sentence > is very vague. What is "this value" that it refers to? Removed it. > >> + */ >> +static int resctrl_io_alloc_closid_get(struct rdt_resource *r) >> +{ >> + int num_closids = closids_supported(); >> + >> + if (resctrl_arch_get_cdp_enabled(r->rid)) >> + num_closids *= 2; >> + >> + if (num_closids != resctrl_arch_get_num_closid(r)) >> + return -ENOSPC; >> + >> + return closids_supported() - 1; >> +} > > resctrl_io_alloc_closid_get() seems to be trying to do two things: > - determine what the io_alloc_closid is > - make sure the io_alloc_closid is supported > > I think this should be split into two functions. Once the > io_alloc_closid is determined to be supported and io_alloc > enabled then there is no reason to keep checking if it is > supported whenever the io_alloc_closid is queried. > > How about simplifying this to: > > /* > * note how this returns u32 that will eliminate > * unnecessary error checking in usages where io_alloc_closid > * needs to be determined after an resctrl_arch_get_io_alloc_enabled(r) > * already confirmed io_alloc is enabled > * function comment could note that this returns the CLOSID > * required by io_alloc but not whether the CLOSID can > * be supported, for this resctrl_io_alloc_closid_supported() should > * be used. > * Can also note that returned value will always be valid if > * resctrl_arch_get_io_alloc_enabled(r) is true. > */ > u32 resctrl_io_alloc_closid(struct rdt_resource *r) { > if (resctrl_arch_get_cdp_enabled(r->rid)) > return resctrl_arch_get_num_closid(r)/2 - 1 > else > return resctrl_arch_get_num_closid(r) -1 > } > > /* > * note how below already makes resctrl's io_alloc implementation > * more generic > */ > resctrl_io_alloc_closid_supported(u32 io_alloc_closid) { > return io_alloc_closid < closids_supported() > } > Sure. Changed the check to return io_alloc_closid == (closids_supported() -1) > > >> + >> /* >> * rdt_bit_usage_show - Display current usage of resources >> * >> @@ -1058,20 +1095,23 @@ static int rdt_bit_usage_show(struct kernfs_open_file *of, >> struct rdt_ctrl_domain *dom; >> int i, hwb, swb, excl, psl; >> enum rdtgrp_mode mode; >> + int io_alloc_closid; >> bool sep = false; >> u32 ctrl_val; >> >> cpus_read_lock(); >> mutex_lock(&rdtgroup_mutex); >> - hw_shareable = r->cache.shareable_bits; >> list_for_each_entry(dom, &r->ctrl_domains, hdr.list) { >> if (sep) >> seq_putc(seq, ';'); >> + hw_shareable = r->cache.shareable_bits; >> sw_shareable = 0; >> exclusive = 0; >> seq_printf(seq, "%d=", dom->hdr.id); >> for (i = 0; i < closids_supported(); i++) { >> - if (!closid_allocated(i)) >> + if (!closid_allocated(i) || >> + (resctrl_arch_get_io_alloc_enabled(r) && >> + i == resctrl_io_alloc_closid_get(r))) >> continue; >> ctrl_val = resctrl_arch_get_config(r, dom, i, >> s->conf_type); >> @@ -1099,6 +1139,24 @@ static int rdt_bit_usage_show(struct kernfs_open_file *of, >> break; >> } >> } >> + >> + /* >> + * When the "io_alloc" feature is enabled, a portion of the >> + * cache is reserved for shared use between hardware and software. >> + */ >> + if (resctrl_arch_get_io_alloc_enabled(r)) { >> + io_alloc_closid = resctrl_io_alloc_closid_get(r); > > In this implementation io_alloc_closid can be negative and the static > checker I used complained about the subsequent usage in > resctrl_arch_get_config() that must be unsigned. > Since resctrl_arch_get_io_alloc_enabled(r) already passed this is one > example where resctrl_io_alloc_closid() can be used. Sure. > >> + if (resctrl_arch_get_cdp_enabled(r->rid)) >> + ctrl_val = resctrl_arch_get_config(r, dom, >> + io_alloc_closid, >> + CDP_CODE); >> + else >> + ctrl_val = resctrl_arch_get_config(r, dom, >> + io_alloc_closid, >> + CDP_NONE); >> + hw_shareable |= ctrl_val; >> + } >> + >> for (i = r->cache.cbm_len - 1; i >= 0; i--) { >> pseudo_locked = dom->plr ? dom->plr->cbm : 0; >> hwb = test_bit(i, &hw_shareable); >> @@ -1803,6 +1861,142 @@ static ssize_t mbm_local_bytes_config_write(struct kernfs_open_file *of, >> return ret ?: nbytes; >> } >> >> +static int resctrl_io_alloc_show(struct kernfs_open_file *of, >> + struct seq_file *seq, void *v) >> +{ >> + struct resctrl_schema *s = rdt_kn_parent_priv(of->kn); >> + struct rdt_resource *r = s->res; >> + > > Needs rdtgroup_mutex Sure. > >> + if (r->cache.io_alloc_capable) { >> + if (resctrl_arch_get_io_alloc_enabled(r)) >> + seq_puts(seq, "enabled\n"); >> + else >> + seq_puts(seq, "disabled\n"); >> + } else { >> + seq_puts(seq, "not supported\n"); >> + } >> + >> + return 0; >> +} >> + >> +/* >> + * Initialize io_alloc CLOSID cache resource with default CBM values. > > It is unclear what "default" means. > Could be "Initialize io_alloc CLOSID cache resource CBM with all usable (shared and unused) cache portions". Sure. > >> + */ >> +static int resctrl_io_alloc_init_cat(struct rdt_resource *r, >> + struct resctrl_schema *s, u32 closid) >> +{ >> + int ret; >> + >> + rdt_staged_configs_clear(); >> + >> + ret = rdtgroup_init_cat(s, closid); >> + if (ret < 0) >> + goto out_init_cat; > > The "out" label should reflect what is done at target, not the source. > Consider all the usages of "out_unlock" in resctrl. > Since this is the only label it can just be "out". Sure. > >> + >> + ret = resctrl_arch_update_domains(r, closid); >> + >> +out_init_cat: >> + rdt_staged_configs_clear(); >> + return ret; >> +} >> + >> +static const char *rdtgroup_name_by_closid(int closid) >> +{ >> + struct rdtgroup *rdtgrp; >> + >> + list_for_each_entry(rdtgrp, &rdt_all_groups, rdtgroup_list) { >> + if (rdtgrp->closid == closid) >> + return rdt_kn_name(rdtgrp->kn); >> + } >> + >> + return NULL; >> +} >> + >> +/* >> + * When CDP is enabled, io_alloc directs traffic using the highest CLOSID >> + * linked to an L3CODE resource. Although CBMs can be accessed through >> + * either L3CODE or L3DATA resources, any updates to the schemata must >> + * always be performed on L3CODE. >> + */ > > I think that updates to the schemata needs to be made on *both* L3DATA and L3CODE. > Consider how __init_one_rdt_domain() works when a new resource group is created ... > the algorithm looks through all allocated CLOSID and the associated schemata impact > the CBM of the new group. If an allocated CLOSID is associated with L3CODE > then its "peer" L3DATA is also taken into account, similar for L3DATA. > If only L3CODE is updated for the io_alloc_closid then it looks to me that > whatever the original L3DATA schema was will keep impacting new resource > groups. To avoid that and ensure only accurate CBMs are used it looks to me > as though the L3DATA and L3CODE schema needs to be kept in sync. Sure. Will verify this. > >> +static struct resctrl_schema *resctrl_schema_io_alloc(struct resctrl_schema *s) >> +{ >> + struct resctrl_schema *schema; >> + >> + if (s->conf_type == CDP_DATA) { >> + list_for_each_entry(schema, &resctrl_schema_all, list) { >> + if (schema->conf_type == CDP_CODE) >> + return schema; >> + } >> + } >> + >> + return s; >> +} >> + >> +static ssize_t resctrl_io_alloc_write(struct kernfs_open_file *of, char *buf, >> + size_t nbytes, loff_t off) >> +{ >> + struct resctrl_schema *s = rdt_kn_parent_priv(of->kn); >> + struct rdt_resource *r = s->res; >> + char const *grp_name; >> + u32 io_alloc_closid; >> + bool enable; >> + int ret; >> + >> + ret = kstrtobool(buf, &enable); >> + if (ret) >> + return ret; >> + >> + cpus_read_lock(); >> + mutex_lock(&rdtgroup_mutex); >> + >> + rdt_last_cmd_clear(); >> + >> + if (!r->cache.io_alloc_capable) { >> + rdt_last_cmd_puts("io_alloc feature is not supported on the resource\n"); > > This could be more useful with a small change: > "io_alloc is not supported on %s\n", s->name Yes. > >> + ret = -ENODEV; >> + goto out_io_alloc; > > out_io_alloc -> out_unlock Sure. > >> + } >> + >> + io_alloc_closid = resctrl_io_alloc_closid_get(r); >> + if (io_alloc_closid < 0) { >> + rdt_last_cmd_puts("Max CLOSID to support io_alloc is not available\n"); > > This could be more useful to help debug by printing the value of io_alloc_closid > that user can compare against output of num_closids files. Here the terms become > a bit complicated since ideally we want to use ctrl_hw_id but that does not match > "num_closids", so perhaps use both terms, for example "CLOSID (ctrl_hw_id)"? > I am not sure here. Fine with me. > >> + ret = -EINVAL; >> + goto out_io_alloc; >> + } >> + >> + if (resctrl_arch_get_io_alloc_enabled(r) != enable) { >> + if (enable) { >> + ret = resctrl_io_alloc_closid_alloc(io_alloc_closid); >> + if (ret < 0) { >> + grp_name = rdtgroup_name_by_closid(io_alloc_closid); > > Below handles !grp_name but that would be a kernel bug, no? Maybe WARN_ON_ONCE()? Sure. > >> + rdt_last_cmd_printf("CLOSID for io_alloc is used by %s group\n", >> + grp_name ? grp_name : "another"); > > > CLOSID -> ctrl_hw_id > sure. >> + ret = -EINVAL; >> + goto out_io_alloc; >> + } >> + >> + ret = resctrl_io_alloc_init_cat(r, resctrl_schema_io_alloc(s), >> + io_alloc_closid); >> + if (ret) { >> + rdt_last_cmd_puts("Failed to initialize io_alloc allocations\n"); >> + resctrl_io_alloc_closid_free(io_alloc_closid); >> + goto out_io_alloc; >> + } >> + >> + } else { >> + resctrl_io_alloc_closid_free(io_alloc_closid); >> + } >> + >> + ret = resctrl_arch_io_alloc_enable(r, enable); >> + } >> + >> +out_io_alloc: > > out_unlock ... to match the other places in resctrl. Sure. > >> + mutex_unlock(&rdtgroup_mutex); >> + cpus_read_unlock(); >> + >> + return ret ?: nbytes; >> +} >> + >> /* rdtgroup information files for one cache resource. */ >> static struct rftype res_common_files[] = { >> { >> @@ -1955,6 +2149,13 @@ static struct rftype res_common_files[] = { >> .seq_show = rdtgroup_schemata_show, >> .fflags = RFTYPE_CTRL_BASE, >> }, >> + { >> + .name = "io_alloc", >> + .mode = 0644, >> + .kf_ops = &rdtgroup_kf_single_ops, >> + .seq_show = resctrl_io_alloc_show, >> + .write = resctrl_io_alloc_write, >> + }, > > Please match existing code wrt tab usage. Sure. > >> { >> .name = "mba_MBps_event", >> .mode = 0644, >> @@ -2062,6 +2263,15 @@ static void thread_throttle_mode_init(void) >> RFTYPE_CTRL_INFO | RFTYPE_RES_MB); >> } >> >> +static void io_alloc_init(void) >> +{ >> + struct rdt_resource *r = resctrl_arch_get_resource(RDT_RESOURCE_L3); >> + >> + if (r->cache.io_alloc_capable) >> + resctrl_file_fflags_init("io_alloc", >> + RFTYPE_CTRL_INFO | RFTYPE_RES_CACHE); >> +} > > Note that even if it only checks the L3 cache resource, this will make the > file visible to all cache resource, also L2. This is why it is important to > ensure documentation and implementation also accommodates resources that > do not support io_alloc. Agree. > >> + >> void resctrl_file_fflags_init(const char *config, unsigned long fflags) >> { >> struct rftype *rft; >> @@ -4249,6 +4459,8 @@ int resctrl_init(void) >> >> thread_throttle_mode_init(); >> >> + io_alloc_init(); >> + >> ret = resctrl_mon_resource_init(); >> if (ret) >> return ret; > > Reinette > -- Thanks Babu Moger