Re: [PATCH v4 0/8] Support L3 Smart Data Cache Injection Allocation Enforcement (SDCIAE)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Babu,

On 5/2/25 5:53 PM, Moger, Babu wrote:
> Hi Reinette,
> 
> Thanks for quick turnaround.
> 
> On 5/2/2025 4:20 PM, Reinette Chatre wrote:
>> Hi Babu,
>>
>> On 4/21/25 3:43 PM, Babu Moger wrote:
>>> # Linux Implementation
>>>
>>> Feature adds following interface files when the resctrl "io_alloc" feature is
>>> supported on L3 resource:
>>>
>>> /sys/fs/resctrl/info/L3/io_alloc: Report the feature status. Enable/disable the
>>>                   feature by writing to the interface.
>>>
>>> /sys/fs/resctrl/info/L3/io_alloc_cbm:  List the Capacity Bit Masks (CBMs) available
>>>                        for I/O devices when io_alloc feature is enabled.
>>>                        Configure the CBM by writing to the interface.
>>>
>>> # Examples:
>>>
>>> a. Check if io_alloc feature is available
>>>     #mount -t resctrl resctrl /sys/fs/resctrl/
>>>
>>>     # cat /sys/fs/resctrl/info/L3/io_alloc
>>>     disabled
>>>
>>> b. Enable the io_alloc feature.
>>>
>>>     # echo 1 > /sys/fs/resctrl/info/L3/io_alloc
>>>     # cat /sys/fs/resctrl/info/L3/io_alloc
>>>     enabled
>>>
>>> c. Check the CBM values for the io_alloc feature.
>>>
>>>     # cat /sys/fs/resctrl/info/L3/io_alloc_cbm
>>>     L3:0=ffff;1=ffff
>>>
>>> d. Change the CBM value for the domain 1:
>>>     # echo L3:1=FF > /sys/fs/resctrl/info/L3/io_alloc_cbm
>>>
>>>     # cat /sys/fs/resctrl/info/L3/io_alloc_cbm
>>>     L3:0=ffff;1=00ff
>>>
>>> d. Disable io_alloc feature and exit.
>>>
>>>     # echo 0 > /sys/fs/resctrl/info/L3/io_alloc
>>>     # cat /sys/fs/resctrl/info/L3/io_alloc
>>>     disabled
>>>
>>>     #umount /sys/fs/resctrl/
>>>
>>
>>> From what I can tell the interface when CDP is enabled will look
>> as follows:
>>
>>       # mount -o cdp -t resctrl resctrl /sys/fs/resctrl/
>>       # cat /sys/fs/resctrl/info/L3CODE/io_alloc
>>       disabled
>>       # cat /sys/fs/resctrl/info/L3DATA/io_alloc
>>       not supported
>>   "io_alloc" can thus be enabled for L3CODE but not for L3DATA.
>> This is unexpected considering the feature is called
>> "L3 Smart *Data* Cache Injection Allocation Enforcement".
>>
>> I understand that the interface evolved into this because the
>> "code" allocation of CDP uses the CLOSID required by SDCIAE but I think
>> leaking implementation details like this to the user interface can
>> cause confusion.
>>
>> Since there is no distinction between code and data in these
>> IO allocations, what do you think of connecting the io_alloc and
>> io_alloc_cbm files within L3CODE and L3DATA so that the user can
>> read/write from either with a read showing the same data and
>> user able to write to either? For example,
>>
>>       # mount -o cdp -t resctrl resctrl /sys/fs/resctrl/
>>       # cat /sys/fs/resctrl/info/L3CODE/io_alloc
>>       disabled
>>       # cat /sys/fs/resctrl/info/L3DATA/io_alloc
>>       disabled
>>     # echo 1 > /sys/fs/resctrl/info/L3CODE/io_alloc
>>       # cat /sys/fs/resctrl/info/L3CODE/io_alloc
>>       enabled
>>       # cat /sys/fs/resctrl/info/L3DATA/io_alloc
>>       enabled
>>       # cat /sys/fs/resctrl/info/L3DATA/io_alloc_cbm
>>       0=ffff;1=ffff
>>       # cat /sys/fs/resctrl/info/L3CODE/io_alloc_cbm
>>       0=ffff;1=ffff
>>       # echo 1=FF > /sys/fs/resctrl/info/L3DATA/io_alloc_cbm
>>       # cat /sys/fs/resctrl/info/L3DATA/io_alloc_cbm
>>       0=ffff;1=00ff
>>       # cat /sys/fs/resctrl/info/L3CODE/io_alloc_cbm
>>       0=ffff;1=00ff
> 
> I agree. There is no right or wrong here. It can be done this way like you mentioned above. But I am not sure if will clear the confusion.
> 
> We have already added the text in user doc (also spec says the same).
> 
> "On AMD systems, the io_alloc feature is supported by the L3 Smart
> Data Cache Injection Allocation Enforcement (SDCIAE). The CLOSID for
> io_alloc is determined by the highest CLOSID supported by the resource.
> When CDP is enabled, io_alloc routes I/O traffic using the highest
> CLOSID allocated for the instruction cache (L3CODE).
> 
> Dont you think this text might clear the confusion? We can add examples also if that makes it even more clear.

The user interface is not intended to be a mirror of the hardware interface.
If it was, doing so is becoming increasingly difficult with multiple
architectures with different hardware intefaces needing to use the same
user interface for control. Remember, there are no "CLOSID" in MPAM and
I do not know details of what RISC-V brings.

We should aim to have something as generic as possible that makes sense
for user space. All the hardware interface details should be hidden as much
as possible from user interface. When we expose the hardware interface details
it becomes very difficult to support new use cases.

The only aspect of "closids" that has been exposed to user space thus far
is the "num_closids" and in user documentation a CLOSid has been linked to the
number of control groups. That is the only constraint we need to think about
here. I have repeatedly asked for IO alloc connection with CLOSIDs to not be exposed
to user space (yet user documentation and messages to user space keeps doing so
in this series). Support for IO alloc in this way is unique to AMD. We do not want
resctrl to be constrained like this if another architecture needs to support
some form of IO alloc and does so in a different way.

I understand that IO alloc backed by CLOSID is forming part of resctrl fs in this
implementation and that is ok for now. As long as we do not leak this to user space
it gives use flexibility to change resctrl fs when/if we learn different architecture
needs later.

>>   (Note in above I removed the resource name from io_alloc_cbm to match
>> what was discussed during previous version:
>> https://lore.kernel.org/lkml/251c8fe1-603f-4993-a822-afb35b49cdfa@xxxxxxx/ )
>> What do you think?
> 
> Yes. I remember. "Kept the resource name while printing the CBM for io_alloc, so we dont have to change show_doms() just for this feature and it is consistant across all the schemata display.

It almost sounds like you do not want to implement something because the
code to support it does not exist?

> 
> I added the note in here.
> https://lore.kernel.org/lkml/784fbc61e02e9a834473c3476ee196ef6a44e338.1745275431.git.babu.moger@xxxxxxx/

You mention "I dont have to change show_doms() just for this feature and it is
consistant across all the schemata display." 
I am indeed seeing a pattern where one goal is to add changes by changing minimum
amount of code. Please let this not be a goal but instead make it a goal to integrate
changes into resctrl appropriately, not just pasted on top.

When it comes to the schemata display then it makes sense to add the resource name since
the schemata file is within a resource group containing multiple resources and the schemata
file thus needs to identify resources. Compare this to, for example, the "bit_usage" file
that is unique to a resource and thus no need to identify the resource.

> 
> I will change it if you feel strongly about it. We will have to change show_doms() to handle this.

What is the problem with changing show_doms()?

> 
>>
>>  
>>> ---
>>> v4: The "io_alloc" interface will report "enabled/disabled/not supported"
>>>      instead of 0 or 1..
>>>
>>>      Updated resctrl_io_alloc_closid_get() to verify the max closid availability
>>>      using closids_supported().
>>>
>>>      Updated the documentation for "shareable_bits" and "bit_usage".
>>>
>>>      NOTE: io_alloc is about specific CLOS. rdt_bit_usage_show() is not designed
>>>      handle bit_usage for specific CLOS. Its about overall system. So, we cannot
>>>      really tell the user which CLOS is shared across both hardware and software.
>>
>> "bit_usage" is not about CLOS but how the resource is used. Per the doc:
>>
>> "bit_usage":
>>         Annotated capacity bitmasks showing how all
>>         instances of the resource are used.
>>
>> The key here is the CBM, not CLOS. For each bit in the *CBM* "bit_usage" shows
>> how that portion of the cache is used with the legend documented in
>> Documentation/arch/x86/resctrl.rst.
>>
>> Consider a system with the following allocations:
>> # cat /sys/fs/resctrl/schemata
>> L3:0=0ff0
> 
> This is CLOS 0.
> 
>> # cat /sys/fs/resctrl/info/L3/io_alloc_cbm
>> 0=ff00
> 
> This is CLOS 15.
> 
>>
>> Then "bit_usage" will look like:
>>
>> # cat /sys/fs/resctrl/info/L3/bit_usage
>> 0=HHHHXXXXSSSS0000
> 
> It is confusing here. To make it clear we may have to print all the CLOSes in each domain.

Could you please elaborate how this is confusing?

> 
> # cat /sys/fs/resctrl/info/L3/bit_usage
> DOM0=CLOS0:SSSSSSSSSSSSSSSS;... ;CLOS15=HHHHXXXXSSSS0000;
> DOM1=CLOS0:SSSSSSSSSSSSSSSS;... ;CLOS15=HHHHXXXXSSSS0000

Please no. Not just does this change existing user interface it also breaks the goal of
"bit_usage".

Please think of it from user perspective. If user wants to know, for example, "how is my
L3 cache allocated" then the "bit_usage" file provides that summary. 

>> "bit_usage" shows how the cache is being used. It shows that the portion of cache represented
>> by first four bits of CBM is unused, portion of cache represented by bits 4 to 7 of CBM is
>> only used by software, portion of cache represented by bits 8 to 11 of CBM is shared between
>> software and hardware, portion of cache represented by bits 12 to 15 is only used by hardware.
>>
>>>      This is something we need to discuss.
>>
>> Looking at implementation in patch #5 the "io_alloc_cbm" bits of CBM are presented
>> as software bits, since "io_alloc_cbm" represents IO from devices it should be "hardware" bits
>> (hw_shareable), no?
>>
> Yes. It is. But logic is bit different there.
> 
> It loops thru all the CLOSes on the domain. So, it will print again like this below.

This is what current code does, but the code can be changed, no? For example, rdt_bit_usage_show()
does not need to treat the IO allocation like all the other resource groups but instead handle it
separately. Below us some pseudo code that presents the idea, untested, not compiled.

	hw_shareable = r->cache.shareable_bits;

	for (i = 0; i < closids_supported(); i++) {
		if (!closid_allocated(i) || 
		    (resctrl_arch_get_io_alloc_enabled(r) && i == resctrl_io_alloc_closid_get(r, s)))
			continue;

		/*  Intitialize sw_shareable and exclusive */
	}

	if (resctrl_arch_get_io_alloc_enabled(r)) {
		 /*
		  * Sidenote: I do not think schemata parameter is needed for
		  * resctrl_io_alloc_closid_get()
		  */
		io_alloc_closid = resctrl_io_alloc_closid_get(r, s);
		if (resctrl_arch_get_cdp_enabled(r->rid))
			ctrl_val = resctrl_arch_get_config(r, dom, io_alloc_closid, CDP_CODE);
		else
			ctrl_val = resctrl_arch_get_config(r, dom, io_alloc_closid, CDP_NONE);
		hw_shareable |= ctrl_val;
	}

	for (i = r->cache.cbm_len - 1; i >= 0; i--) {
		/* Write annotated bitmask to user space */
	}
	
> 
> #cat bit_usage
> 0=HHHHXXXXSSSS0000
> 
> It tells the user that all the CLOSes in domain 0 has this sharing propery which is not correct.
> 
> To make it clear we really need to print every CLOS here. What do you think?

No. We cannot just change user space API like this. The way I see it the implementation can
support existing user space API. I am sure the above can be improved but it presents an idea
that we can use to start with.

Reinette





[Index of Archives]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]     [Linux Resources]

  Powered by Linux