Re: [PATCH v6 5/8] fs/resctrl: Add user interface to enable/disable io_alloc feature

"Moger, Babu" <babu.moger@xxxxxxx> · Thu, 19 Jun 2025 13:41:45 -0500

Hi Reinette,

On 6/17/25 22:59, Reinette Chatre wrote:
> Hi Babu,
> 
> On 6/11/25 2:23 PM, Babu Moger wrote:
>> The io_alloc feature in resctrl is a mechanism that enables direct
> 
> "The ... feature ... is a mechanism"? What does it mean when a feature
> is a mechanism? How about just "The io_alloc feature in resctrl enables ..."?

Sure.

> 
>> insertion of data from I/O devices into the L3 cache.
> 
> Drop "L3"?
> 

Sure.

>>
>> On AMD systems, io_alloc feature is backed by SDCIAE (L3 Smart Data Cache
>> Injection Allocation Enforcement). When enabled, SDCIAE forces all SDCI
>> lines to be placed into the L3 cache partitions identified by the
>> highest-supported L3_MASK_n register as reported by CPUID
>> Fn0000_0010_EDX_x1.MAX_COS. For example, if MAX_COS=15, SDCI lines will
>> be allocated into the L3 cache partitions determined by the bitmask in
>> the L3_MASK_15 register.
> 
> This is a resctrl fs patch but most of changelog so far is about AMD's implementation
> details. I do think it is relevant, but all the register details can
> probably be dropped since it is too low level to support the goal. I believe the
> goal here is to motivate resctrl fs's implementation that needs to pick highest
> CLOSID. So, essentially the changelog needs to hightlight that AMD's implementation
> requires highest CLOSID and without any other reference that is what resctrl fs's
> implementation supports. I think it is important to highlight that the
> user interface is not tied to this implementation decision to avoid future issues
> if another architecture support "io_alloc" "differently". Internals of
> resctrl fs can then be changed.
> 

Sure. Will split these patches. Will try make the text more generic.

>>
>> When CDP is enabled, io_alloc routes I/O traffic using the highest CLOSID
>> allocated for the instruction cache (L3CODE).
> 
> Again, this is a resctrl fs patch and above is an AMD implementation detail. resctrl
> fs should not be limited by how AMD implements the feature but can use it as first
> reference.

Sure.

> 
>>
>> Introduce user interface to enable/disable "io_alloc" feature.
> 
> This patch does more than this.

Sure.

> 
>>
>> Signed-off-by: Babu Moger <babu.moger@xxxxxxx>
>> ---
> ...
> 
>> ---
>>  Documentation/filesystems/resctrl.rst |  34 ++++
>>  fs/resctrl/rdtgroup.c                 | 216 +++++++++++++++++++++++++-
> 
> This patch contains several logical changes that are not adequately described
> in changelog.
> I think the following can be separate patches:
> - rdt_bit_usage_show() change
> - io_alloc_init() definition and usage
> - show() and write() helpers
> - possibly the io_alloc_closid helpers (more later)

Yes. Splitting this into 3 patches.
1. rdt_bit_usage_show() change
   Updates shareable_bits section.
   It calls - resctrl_io_alloc_closid().

2. io_alloc_init() definition and usage
  Introdce the inteface and add show()

3. Add io_alloc write().
   Calls resctrl_io_alloc_closid_supported() and
resctrl_io_alloc_closid_get().

> 
>>  2 files changed, 248 insertions(+), 2 deletions(-)
>>
>> diff --git a/Documentation/filesystems/resctrl.rst b/Documentation/filesystems/resctrl.rst
>> index c7949dd44f2f..03c829b2c276 100644
>> --- a/Documentation/filesystems/resctrl.rst
>> +++ b/Documentation/filesystems/resctrl.rst
>> @@ -95,6 +95,11 @@ related to allocation:
>>  		some platforms support devices that have their
>>  		own settings for cache use which can over-ride
>>  		these bits.
>> +
>> +		When the "io_alloc" feature is enabled, a portion of the cache
>> +		is reserved for shared use between hardware and software. Refer
> 
> "reserved" and "shared" sounds like a contradiction. How about "is reserved" ->
> "can be configured"?
> 
>> +		to "bit_usage" to see which portion is allocated for this purpose.
> 
> This is the "shareable_bits" section and the reason why user is pointed to
> "bit_usage" is because "shareable_bits" is a global CBM that applies to all
> cache domains/instances while "bit_usage" is per cache domain/instance. I think
> it will be helpful to point this out to the user.
> Perhaps second sentence can be replaced with something like:
> ""bit_usage" should be used to see which portions of each cache instance
>  is configured for hardware use via the "io_alloc" feature because every cache
>  instance can have its "io_alloc" bitmask configured independently".

Sure.

""bit_usage" should be used to see which portions of each cache
instance is configured for hardware use via the "io_alloc" feature
because every cache instance can have its "io_alloc" bitmask
configured independently.

> 
> Please do improve this.
> 
> To complete this the first part of the "shareable_bits" doc can be amended:
> "Bitmask of shareable resource" -> "Bitmask (applicable to all instances of this resource) of shareable resource"
> What do you think?

Sure. Added one sentance for that.

Bitmask of shareable resource with other executing entities
(e.g. I/O). Applies to all instances of this resource.

> 
>> +
>>  "bit_usage":
>>  		Annotated capacity bitmasks showing how all
>>  		instances of the resource are used. The legend is:
>> @@ -135,6 +140,35 @@ related to allocation:
>>  			"1":
>>  			      Non-contiguous 1s value in CBM is supported.
>>  
>> +"io_alloc":
>> +		The "io_alloc" enables system software to configure the portion
> 
> "The "io_alloc" enables"? Maybe just ""io_alloc" enables"?
> 
>> +		of the L3 cache allocated for I/O traffic.
> 
> Drop "L3"?
> 
> Can append, for example, "File may only exist if the system supports this feature on some of its cache
> resources".
> 
>> +
>> +		The feature routes the I/O traffic via specific CLOSID reserved
>> +		for io_alloc feature. By configuring the CBM (Capacity Bit Mask)
>> +		for the CLOSID, users can control the L3 portions available for
>> +		I/0 traffic. The reserved CLOSID will be excluded for group creation.
> 
> Looking back I've commented *four* times already about how resctrl fs user interface
> should not be made specific to AMD's implementation.
> This paragraph just be dropped?
> 
> 
> The rest can be something like:
> 
> 		"disabled": Portions of cache used for allocation of I/O traffic cannot be configured.
> 		"enabled": Portions of cache used for allocation of I/O traffic can be configured using "io_alloc_cbm"
> 		"not supported": ...
> 
> 		The underlying implementation may reduce resources available to
> 		general (CPU) cache allocation. See architecture specific notes below.
> 		Depending on usage requirements the feature can be enabled or disabled:
> 
> 		To enable:
> 			# echo 1 > /sys/fs/resctrl/info/L3/io_alloc
> 
> 		To disable:
> 			# echo 0 > /sys/fs/resctrl/info/L3/io_alloc

Sure.

> 
>> +
>> +		The interface provides a means to query the status of the feature.
>> +
>> +		Example::
>> +
>> +			# cat /sys/fs/resctrl/info/L3/io_alloc
>> +			disabled
>> +
>> +		Feature can be enabled/disabled by writing to the interface.
>> +		Example::
>> +
>> +			# echo 1 > /sys/fs/resctrl/info/L3/io_alloc
>> +			# cat /sys/fs/resctrl/info/L3/io_alloc
>> +			enabled
>> +
>> +		On AMD systems, the io_alloc feature is supported by the L3 Smart
>> +		Data Cache Injection Allocation Enforcement (SDCIAE). The CLOSID for
>> +		io_alloc is determined by the highest CLOSID supported by the resource.
>> +		When CDP is enabled, io_alloc routes I/O traffic using the highest
>> +		CLOSID allocated for the instruction cache (L3CODE).
>> +
>>  Memory bandwidth(MB) subdirectory contains the following files
>>  with respect to allocation:
>>  
>> diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
>> index 1beb124e25f6..bbc032b4d0e9 100644
>> --- a/fs/resctrl/rdtgroup.c
>> +++ b/fs/resctrl/rdtgroup.c
>> @@ -70,6 +70,7 @@ static struct seq_buf last_cmd_status;
>>  static char last_cmd_status_buf[512];
>>  
>>  static int rdtgroup_setup_root(struct rdt_fs_context *ctx);
>> +static int rdtgroup_init_cat(struct resctrl_schema *s, u32 closid);
>>  
>>  static void rdtgroup_destroy_root(void);
>>  
>> @@ -232,6 +233,19 @@ bool closid_allocated(unsigned int closid)
>>  	return !test_bit(closid, closid_free_map);
>>  }
>>  
>> +static int resctrl_io_alloc_closid_alloc(u32 io_alloc_closid)
>> +{
>> +	if (__test_and_clear_bit(io_alloc_closid, closid_free_map))
>> +		return io_alloc_closid;
>> +	else
>> +		return -ENOSPC;
>> +}
>> +
>> +static void resctrl_io_alloc_closid_free(u32 io_alloc_closid)
>> +{
>> +	closid_free(io_alloc_closid);
>> +}
>> +
>>  /**
>>   * rdtgroup_mode_by_closid - Return mode of resource group with closid
>>   * @closid: closid if the resource group
>> @@ -1030,6 +1044,29 @@ static int rdt_shareable_bits_show(struct kernfs_open_file *of,
>>  	return 0;
>>  }
>>  
>> +/*
>> + * resctrl_io_alloc_closid_get - io_alloc feature uses max CLOSID to route
>> + * the IO traffic. Get the max CLOSID and verify if the CLOSID is available.
> 
> The function name should be followed by a *brief* description.

Sure.

> 
>> + *
>> + * The total number of CLOSIDs is determined in closid_init(),  based on the
>> + * minimum supported across all resources. If CDP (Code Data Prioritization)
>> + * is enabled, the number of CLOSIDs is halved. The final value is returned
>> + * by closids_supported(). Make sure this value aligns with the maximum
>> + * CLOSID supported by the respective resource.
> 
> All but the last sentence is unrelated to this function and the last sentence
> is very vague. What is "this value" that it refers to?

Removed it.

> 
>> + */
>> +static int resctrl_io_alloc_closid_get(struct rdt_resource *r)
>> +{
>> +	int num_closids = closids_supported();
>> +
>> +	if (resctrl_arch_get_cdp_enabled(r->rid))
>> +		num_closids *= 2;
>> +
>> +	if (num_closids != resctrl_arch_get_num_closid(r))
>> +		return -ENOSPC;
>> +
>> +	return closids_supported() - 1;
>> +}
> 
> resctrl_io_alloc_closid_get() seems to be trying to do two things: 
> - determine what the io_alloc_closid is
> - make sure the io_alloc_closid is supported
> 
> I think this should be split into two functions. Once the
> io_alloc_closid is determined to be supported and io_alloc
> enabled then there is no reason to keep checking if it is
> supported whenever the io_alloc_closid is queried.
> 
> How about simplifying this to:
> 
> /*
>  * note how this returns u32 that will eliminate
>  * unnecessary error checking in usages where io_alloc_closid
>  * needs to be determined after an resctrl_arch_get_io_alloc_enabled(r)
>  * already confirmed io_alloc is enabled
>  * function comment could note that this returns the CLOSID
>  * required by io_alloc but not whether the CLOSID can
>  * be supported, for this resctrl_io_alloc_closid_supported() should
>  * be used.
>  * Can also note that returned value will always be valid if
>  * resctrl_arch_get_io_alloc_enabled(r) is true.
>  */
> u32 resctrl_io_alloc_closid(struct rdt_resource *r) {
> 	if (resctrl_arch_get_cdp_enabled(r->rid))
> 		return resctrl_arch_get_num_closid(r)/2  - 1
> 	else
> 		return resctrl_arch_get_num_closid(r) -1
> }
> 
> /*
>  * note how below already makes resctrl's io_alloc implementation
>  * more generic
>  */
> resctrl_io_alloc_closid_supported(u32 io_alloc_closid) {
> 	return io_alloc_closid <  closids_supported()
> }
> 

Sure.
   Changed the check to

    return io_alloc_closid == (closids_supported() -1)

> 
> 
>> +
>>  /*
>>   * rdt_bit_usage_show - Display current usage of resources
>>   *
>> @@ -1058,20 +1095,23 @@ static int rdt_bit_usage_show(struct kernfs_open_file *of,
>>  	struct rdt_ctrl_domain *dom;
>>  	int i, hwb, swb, excl, psl;
>>  	enum rdtgrp_mode mode;
>> +	int io_alloc_closid;
>>  	bool sep = false;
>>  	u32 ctrl_val;
>>  
>>  	cpus_read_lock();
>>  	mutex_lock(&rdtgroup_mutex);
>> -	hw_shareable = r->cache.shareable_bits;
>>  	list_for_each_entry(dom, &r->ctrl_domains, hdr.list) {
>>  		if (sep)
>>  			seq_putc(seq, ';');
>> +		hw_shareable = r->cache.shareable_bits;
>>  		sw_shareable = 0;
>>  		exclusive = 0;
>>  		seq_printf(seq, "%d=", dom->hdr.id);
>>  		for (i = 0; i < closids_supported(); i++) {
>> -			if (!closid_allocated(i))
>> +			if (!closid_allocated(i) ||
>> +			    (resctrl_arch_get_io_alloc_enabled(r) &&
>> +			     i == resctrl_io_alloc_closid_get(r)))
>>  				continue;
>>  			ctrl_val = resctrl_arch_get_config(r, dom, i,
>>  							   s->conf_type);
>> @@ -1099,6 +1139,24 @@ static int rdt_bit_usage_show(struct kernfs_open_file *of,
>>  				break;
>>  			}
>>  		}
>> +
>> +		/*
>> +		 * When the "io_alloc" feature is enabled, a portion of the
>> +		 * cache is reserved for shared use between hardware and software.
>> +		 */
>> +		if (resctrl_arch_get_io_alloc_enabled(r)) {
>> +			io_alloc_closid = resctrl_io_alloc_closid_get(r);
> 
> In this implementation io_alloc_closid can be negative and the static
> checker I used complained about the subsequent usage in 
> resctrl_arch_get_config() that must be unsigned.
> Since resctrl_arch_get_io_alloc_enabled(r) already passed this is one
> example where resctrl_io_alloc_closid() can be used.

Sure.

> 
>> +			if (resctrl_arch_get_cdp_enabled(r->rid))
>> +				ctrl_val = resctrl_arch_get_config(r, dom,
>> +								   io_alloc_closid,
>> +								   CDP_CODE);
>> +			else
>> +				ctrl_val = resctrl_arch_get_config(r, dom,
>> +								   io_alloc_closid,
>> +								   CDP_NONE);
>> +			hw_shareable |= ctrl_val;
>> +		}
>> +
>>  		for (i = r->cache.cbm_len - 1; i >= 0; i--) {
>>  			pseudo_locked = dom->plr ? dom->plr->cbm : 0;
>>  			hwb = test_bit(i, &hw_shareable);
>> @@ -1803,6 +1861,142 @@ static ssize_t mbm_local_bytes_config_write(struct kernfs_open_file *of,
>>  	return ret ?: nbytes;
>>  }
>>  
>> +static int resctrl_io_alloc_show(struct kernfs_open_file *of,
>> +				 struct seq_file *seq, void *v)
>> +{
>> +	struct resctrl_schema *s = rdt_kn_parent_priv(of->kn);
>> +	struct rdt_resource *r = s->res;
>> +
> 
> Needs rdtgroup_mutex

Sure.

> 
>> +	if (r->cache.io_alloc_capable) {
>> +		if (resctrl_arch_get_io_alloc_enabled(r))
>> +			seq_puts(seq, "enabled\n");
>> +		else
>> +			seq_puts(seq, "disabled\n");
>> +	} else {
>> +		seq_puts(seq, "not supported\n");
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> +/*
>> + * Initialize io_alloc CLOSID cache resource with default CBM values.
> 
> It is unclear what "default" means.
> Could be "Initialize io_alloc CLOSID cache resource CBM with all usable (shared and unused) cache portions".

Sure.

> 
>> + */
>> +static int resctrl_io_alloc_init_cat(struct rdt_resource *r,
>> +				     struct resctrl_schema *s, u32 closid)
>> +{
>> +	int ret;
>> +
>> +	rdt_staged_configs_clear();
>> +
>> +	ret = rdtgroup_init_cat(s, closid);
>> +	if (ret < 0)
>> +		goto out_init_cat;
> 
> The "out" label should reflect what is done at target, not the source.
> Consider all the usages of "out_unlock" in resctrl.
> Since this is the only label it can just be "out".

Sure.

> 
>> +
>> +	ret = resctrl_arch_update_domains(r, closid);
>> +
>> +out_init_cat:
>> +	rdt_staged_configs_clear();
>> +	return ret;
>> +}
>> +
>> +static const char *rdtgroup_name_by_closid(int closid)
>> +{
>> +	struct rdtgroup *rdtgrp;
>> +
>> +	list_for_each_entry(rdtgrp, &rdt_all_groups, rdtgroup_list) {
>> +		if (rdtgrp->closid == closid)
>> +			return rdt_kn_name(rdtgrp->kn);
>> +	}
>> +
>> +	return NULL;
>> +}
>> +
>> +/*
>> + * When CDP is enabled, io_alloc directs traffic using the highest CLOSID
>> + * linked to an L3CODE resource. Although CBMs can be accessed through
>> + * either L3CODE or L3DATA resources, any updates to the schemata must
>> + * always be performed on L3CODE.
>> + */
> 
> I think that updates to the schemata needs to be made on *both* L3DATA and L3CODE.
> Consider how __init_one_rdt_domain() works when a new resource group is created ...
> the algorithm looks through all allocated CLOSID and the associated schemata impact
> the CBM of the new group. If an allocated CLOSID is associated with L3CODE
> then its "peer" L3DATA is also taken into account, similar for L3DATA.
> If only L3CODE is updated for the io_alloc_closid then it looks to me that
> whatever the original L3DATA schema was will keep impacting new resource
> groups. To avoid that and ensure only accurate CBMs are used it looks to me
> as though the L3DATA and L3CODE schema needs to be kept in sync.

Sure. Will verify this.

> 
>> +static struct resctrl_schema *resctrl_schema_io_alloc(struct resctrl_schema *s)
>> +{
>> +	struct resctrl_schema *schema;
>> +
>> +	if (s->conf_type == CDP_DATA) {
>> +		list_for_each_entry(schema, &resctrl_schema_all, list) {
>> +			if (schema->conf_type == CDP_CODE)
>> +				return schema;
>> +		}
>> +	}
>> +
>> +	return s;
>> +}
>> +
>> +static ssize_t resctrl_io_alloc_write(struct kernfs_open_file *of, char *buf,
>> +				      size_t nbytes, loff_t off)
>> +{
>> +	struct resctrl_schema *s = rdt_kn_parent_priv(of->kn);
>> +	struct rdt_resource *r = s->res;
>> +	char const *grp_name;
>> +	u32 io_alloc_closid;
>> +	bool enable;
>> +	int ret;
>> +
>> +	ret = kstrtobool(buf, &enable);
>> +	if (ret)
>> +		return ret;
>> +
>> +	cpus_read_lock();
>> +	mutex_lock(&rdtgroup_mutex);
>> +
>> +	rdt_last_cmd_clear();
>> +
>> +	if (!r->cache.io_alloc_capable) {
>> +		rdt_last_cmd_puts("io_alloc feature is not supported on the resource\n");
> 
> This could be more useful with a small change:
> "io_alloc is not supported on %s\n", s->name

Yes.

> 
>> +		ret = -ENODEV;
>> +		goto out_io_alloc;
> 
> out_io_alloc -> out_unlock

Sure.

> 
>> +	}
>> +
>> +	io_alloc_closid = resctrl_io_alloc_closid_get(r);
>> +	if (io_alloc_closid < 0) {
>> +		rdt_last_cmd_puts("Max CLOSID to support io_alloc is not available\n");
> 
> This could be more useful to help debug by printing the value of io_alloc_closid
> that user can compare against output of num_closids files. Here the terms become
> a bit complicated since ideally we want to use ctrl_hw_id but that does not match
> "num_closids", so perhaps use both terms, for example "CLOSID (ctrl_hw_id)"?
> I am not sure here.

Fine with me.

> 
>> +		ret = -EINVAL;
>> +		goto out_io_alloc;
>> +	}
>> +
>> +	if (resctrl_arch_get_io_alloc_enabled(r) != enable) {
>> +		if (enable) {
>> +			ret = resctrl_io_alloc_closid_alloc(io_alloc_closid);
>> +			if (ret < 0) {
>> +				grp_name = rdtgroup_name_by_closid(io_alloc_closid);
> 
> Below handles !grp_name but that would be a kernel bug, no? Maybe WARN_ON_ONCE()?

Sure.

> 
>> +				rdt_last_cmd_printf("CLOSID for io_alloc is used by %s group\n",
>> +						    grp_name ? grp_name : "another");
> 
> 
> CLOSID -> ctrl_hw_id
> 
sure.

>> +				ret = -EINVAL;
>> +				goto out_io_alloc;
>> +			}
>> +
>> +			ret = resctrl_io_alloc_init_cat(r, resctrl_schema_io_alloc(s),
>> +							io_alloc_closid);
>> +			if (ret) {
>> +				rdt_last_cmd_puts("Failed to initialize io_alloc allocations\n");
>> +				resctrl_io_alloc_closid_free(io_alloc_closid);
>> +				goto out_io_alloc;
>> +			}
>> +
>> +		} else {
>> +			resctrl_io_alloc_closid_free(io_alloc_closid);
>> +		}
>> +
>> +		ret = resctrl_arch_io_alloc_enable(r, enable);
>> +	}
>> +
>> +out_io_alloc:
> 
> out_unlock ... to match the other places in resctrl.

Sure.

> 
>> +	mutex_unlock(&rdtgroup_mutex);
>> +	cpus_read_unlock();
>> +
>> +	return ret ?: nbytes;
>> +}
>> +
>>  /* rdtgroup information files for one cache resource. */
>>  static struct rftype res_common_files[] = {
>>  	{
>> @@ -1955,6 +2149,13 @@ static struct rftype res_common_files[] = {
>>  		.seq_show	= rdtgroup_schemata_show,
>>  		.fflags		= RFTYPE_CTRL_BASE,
>>  	},
>> +	{
>> +		.name           = "io_alloc",
>> +		.mode           = 0644,
>> +		.kf_ops         = &rdtgroup_kf_single_ops,
>> +		.seq_show       = resctrl_io_alloc_show,
>> +		.write          = resctrl_io_alloc_write,
>> +	},
> 
> Please match existing code wrt tab usage.

Sure.

> 
>>  	{
>>  		.name		= "mba_MBps_event",
>>  		.mode		= 0644,
>> @@ -2062,6 +2263,15 @@ static void thread_throttle_mode_init(void)
>>  				 RFTYPE_CTRL_INFO | RFTYPE_RES_MB);
>>  }
>>  
>> +static void io_alloc_init(void)
>> +{
>> +	struct rdt_resource *r = resctrl_arch_get_resource(RDT_RESOURCE_L3);
>> +
>> +	if (r->cache.io_alloc_capable)
>> +		resctrl_file_fflags_init("io_alloc",
>> +					 RFTYPE_CTRL_INFO | RFTYPE_RES_CACHE);
>> +}
> 
> Note that even if it only checks the L3 cache resource, this will make the
> file visible to all cache resource, also L2. This is why it is important to
> ensure documentation and implementation also accommodates resources that
> do not support io_alloc.

Agree.
> 
>> +
>>  void resctrl_file_fflags_init(const char *config, unsigned long fflags)
>>  {
>>  	struct rftype *rft;
>> @@ -4249,6 +4459,8 @@ int resctrl_init(void)
>>  
>>  	thread_throttle_mode_init();
>>  
>> +	io_alloc_init();
>> +
>>  	ret = resctrl_mon_resource_init();
>>  	if (ret)
>>  		return ret;
> 
> Reinette
> 

-- 
Thanks
Babu Moger