Re: [PATCH v4 0/8] Support L3 Smart Data Cache Injection Allocation Enforcement (SDCIAE)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Reinette,

On 5/5/25 11:22, Reinette Chatre wrote:
> Hi Babu,
> 
> On 5/2/25 5:53 PM, Moger, Babu wrote:
>> Hi Reinette,
>>
>> Thanks for quick turnaround.
>>
>> On 5/2/2025 4:20 PM, Reinette Chatre wrote:
>>> Hi Babu,
>>>
>>> On 4/21/25 3:43 PM, Babu Moger wrote:
>>>> # Linux Implementation
>>>>
>>>> Feature adds following interface files when the resctrl "io_alloc" feature is
>>>> supported on L3 resource:
>>>>
>>>> /sys/fs/resctrl/info/L3/io_alloc: Report the feature status. Enable/disable the
>>>>                   feature by writing to the interface.
>>>>
>>>> /sys/fs/resctrl/info/L3/io_alloc_cbm:  List the Capacity Bit Masks (CBMs) available
>>>>                        for I/O devices when io_alloc feature is enabled.
>>>>                        Configure the CBM by writing to the interface.
>>>>
>>>> # Examples:
>>>>
>>>> a. Check if io_alloc feature is available
>>>>     #mount -t resctrl resctrl /sys/fs/resctrl/
>>>>
>>>>     # cat /sys/fs/resctrl/info/L3/io_alloc
>>>>     disabled
>>>>
>>>> b. Enable the io_alloc feature.
>>>>
>>>>     # echo 1 > /sys/fs/resctrl/info/L3/io_alloc
>>>>     # cat /sys/fs/resctrl/info/L3/io_alloc
>>>>     enabled
>>>>
>>>> c. Check the CBM values for the io_alloc feature.
>>>>
>>>>     # cat /sys/fs/resctrl/info/L3/io_alloc_cbm
>>>>     L3:0=ffff;1=ffff
>>>>
>>>> d. Change the CBM value for the domain 1:
>>>>     # echo L3:1=FF > /sys/fs/resctrl/info/L3/io_alloc_cbm
>>>>
>>>>     # cat /sys/fs/resctrl/info/L3/io_alloc_cbm
>>>>     L3:0=ffff;1=00ff
>>>>
>>>> d. Disable io_alloc feature and exit.
>>>>
>>>>     # echo 0 > /sys/fs/resctrl/info/L3/io_alloc
>>>>     # cat /sys/fs/resctrl/info/L3/io_alloc
>>>>     disabled
>>>>
>>>>     #umount /sys/fs/resctrl/
>>>>
>>>
>>>> From what I can tell the interface when CDP is enabled will look
>>> as follows:
>>>
>>>       # mount -o cdp -t resctrl resctrl /sys/fs/resctrl/
>>>       # cat /sys/fs/resctrl/info/L3CODE/io_alloc
>>>       disabled
>>>       # cat /sys/fs/resctrl/info/L3DATA/io_alloc
>>>       not supported
>>>   "io_alloc" can thus be enabled for L3CODE but not for L3DATA.
>>> This is unexpected considering the feature is called
>>> "L3 Smart *Data* Cache Injection Allocation Enforcement".
>>>
>>> I understand that the interface evolved into this because the
>>> "code" allocation of CDP uses the CLOSID required by SDCIAE but I think
>>> leaking implementation details like this to the user interface can
>>> cause confusion.
>>>
>>> Since there is no distinction between code and data in these
>>> IO allocations, what do you think of connecting the io_alloc and
>>> io_alloc_cbm files within L3CODE and L3DATA so that the user can
>>> read/write from either with a read showing the same data and
>>> user able to write to either? For example,
>>>
>>>       # mount -o cdp -t resctrl resctrl /sys/fs/resctrl/
>>>       # cat /sys/fs/resctrl/info/L3CODE/io_alloc
>>>       disabled
>>>       # cat /sys/fs/resctrl/info/L3DATA/io_alloc
>>>       disabled
>>>     # echo 1 > /sys/fs/resctrl/info/L3CODE/io_alloc
>>>       # cat /sys/fs/resctrl/info/L3CODE/io_alloc
>>>       enabled
>>>       # cat /sys/fs/resctrl/info/L3DATA/io_alloc
>>>       enabled
>>>       # cat /sys/fs/resctrl/info/L3DATA/io_alloc_cbm
>>>       0=ffff;1=ffff
>>>       # cat /sys/fs/resctrl/info/L3CODE/io_alloc_cbm
>>>       0=ffff;1=ffff
>>>       # echo 1=FF > /sys/fs/resctrl/info/L3DATA/io_alloc_cbm
>>>       # cat /sys/fs/resctrl/info/L3DATA/io_alloc_cbm
>>>       0=ffff;1=00ff
>>>       # cat /sys/fs/resctrl/info/L3CODE/io_alloc_cbm
>>>       0=ffff;1=00ff
>>
>> I agree. There is no right or wrong here. It can be done this way like you mentioned above. But I am not sure if will clear the confusion.
>>
>> We have already added the text in user doc (also spec says the same).
>>
>> "On AMD systems, the io_alloc feature is supported by the L3 Smart
>> Data Cache Injection Allocation Enforcement (SDCIAE). The CLOSID for
>> io_alloc is determined by the highest CLOSID supported by the resource.
>> When CDP is enabled, io_alloc routes I/O traffic using the highest
>> CLOSID allocated for the instruction cache (L3CODE).
>>
>> Dont you think this text might clear the confusion? We can add examples also if that makes it even more clear.
> 
> The user interface is not intended to be a mirror of the hardware interface.
> If it was, doing so is becoming increasingly difficult with multiple
> architectures with different hardware intefaces needing to use the same
> user interface for control. Remember, there are no "CLOSID" in MPAM and
> I do not know details of what RISC-V brings.
> 
> We should aim to have something as generic as possible that makes sense
> for user space. All the hardware interface details should be hidden as much
> as possible from user interface. When we expose the hardware interface details
> it becomes very difficult to support new use cases.
> 
> The only aspect of "closids" that has been exposed to user space thus far
> is the "num_closids" and in user documentation a CLOSid has been linked to the
> number of control groups. That is the only constraint we need to think about
> here. I have repeatedly asked for IO alloc connection with CLOSIDs to not be exposed
> to user space (yet user documentation and messages to user space keeps doing so
> in this series). Support for IO alloc in this way is unique to AMD. We do not want
> resctrl to be constrained like this if another architecture needs to support
> some form of IO alloc and does so in a different way.
> 
> I understand that IO alloc backed by CLOSID is forming part of resctrl fs in this
> implementation and that is ok for now. As long as we do not leak this to user space
> it gives use flexibility to change resctrl fs when/if we learn different architecture
> needs later.

That makes sense. I’ll go ahead and adjust it as suggested.

> 

>>>   (Note in above I removed the resource name from io_alloc_cbm to match
>>> what was discussed during previous version:
>>> https://lore.kernel.org/lkml/251c8fe1-603f-4993-a822-afb35b49cdfa@xxxxxxx/ )
>>> What do you think?
>>
>> Yes. I remember. "Kept the resource name while printing the CBM for io_alloc, so we dont have to change show_doms() just for this feature and it is consistant across all the schemata display.
> 
> It almost sounds like you do not want to implement something because the
> code to support it does not exist?
> 
>>
>> I added the note in here.
>> https://lore.kernel.org/lkml/784fbc61e02e9a834473c3476ee196ef6a44e338.1745275431.git.babu.moger@xxxxxxx/
> 
> You mention "I dont have to change show_doms() just for this feature and it is
> consistant across all the schemata display." 
> I am indeed seeing a pattern where one goal is to add changes by changing minimum
> amount of code. Please let this not be a goal but instead make it a goal to integrate
> changes into resctrl appropriately, not just pasted on top.
> 
> When it comes to the schemata display then it makes sense to add the resource name since
> the schemata file is within a resource group containing multiple resources and the schemata
> file thus needs to identify resources. Compare this to, for example, the "bit_usage" file
> that is unique to a resource and thus no need to identify the resource.
> 
>>
>> I will change it if you feel strongly about it. We will have to change show_doms() to handle this.
> 
> What is the problem with changing show_doms()?

There is no problem changing show_doms(). My intenstion was to keep the
change as minimul as possible.

Sure. Will make the changes "not" to print the resource name for io_alloc_cbm.

> 
>>
>>>
>>>  
>>>> ---
>>>> v4: The "io_alloc" interface will report "enabled/disabled/not supported"
>>>>      instead of 0 or 1..
>>>>
>>>>      Updated resctrl_io_alloc_closid_get() to verify the max closid availability
>>>>      using closids_supported().
>>>>
>>>>      Updated the documentation for "shareable_bits" and "bit_usage".
>>>>
>>>>      NOTE: io_alloc is about specific CLOS. rdt_bit_usage_show() is not designed
>>>>      handle bit_usage for specific CLOS. Its about overall system. So, we cannot
>>>>      really tell the user which CLOS is shared across both hardware and software.
>>>
>>> "bit_usage" is not about CLOS but how the resource is used. Per the doc:
>>>
>>> "bit_usage":
>>>         Annotated capacity bitmasks showing how all
>>>         instances of the resource are used.
>>>
>>> The key here is the CBM, not CLOS. For each bit in the *CBM* "bit_usage" shows
>>> how that portion of the cache is used with the legend documented in
>>> Documentation/arch/x86/resctrl.rst.
>>>
>>> Consider a system with the following allocations:
>>> # cat /sys/fs/resctrl/schemata
>>> L3:0=0ff0
>>
>> This is CLOS 0.
>>
>>> # cat /sys/fs/resctrl/info/L3/io_alloc_cbm
>>> 0=ff00
>>
>> This is CLOS 15.
>>
>>>
>>> Then "bit_usage" will look like:
>>>
>>> # cat /sys/fs/resctrl/info/L3/bit_usage
>>> 0=HHHHXXXXSSSS0000
>>
>> It is confusing here. To make it clear we may have to print all the CLOSes in each domain.
> 
> Could you please elaborate how this is confusing?

# cat /sys/fs/resctrl/info/L3/bit_usage
0=HHHHXXXXSSSS0000

This may give the impression that the all CLOSes in all domains carries
this property, but in reality, it applies only to one CLOS(15) within each
domain.

Example below....

> 
>>
>> # cat /sys/fs/resctrl/info/L3/bit_usage
>> DOM0=CLOS0:SSSSSSSSSSSSSSSS;... ;CLOS15=HHHHXXXXSSSS0000;
>> DOM1=CLOS0:SSSSSSSSSSSSSSSS;... ;CLOS15=HHHHXXXXSSSS0000
> 
> Please no. Not just does this change existing user interface it also breaks the goal of
> "bit_usage".
> 
> Please think of it from user perspective. If user wants to know, for example, "how is my
> L3 cache allocated" then the "bit_usage" file provides that summary. 
> 
>>> "bit_usage" shows how the cache is being used. It shows that the portion of cache represented
>>> by first four bits of CBM is unused, portion of cache represented by bits 4 to 7 of CBM is
>>> only used by software, portion of cache represented by bits 8 to 11 of CBM is shared between
>>> software and hardware, portion of cache represented by bits 12 to 15 is only used by hardware.
>>>
>>>>      This is something we need to discuss.
>>>
>>> Looking at implementation in patch #5 the "io_alloc_cbm" bits of CBM are presented
>>> as software bits, since "io_alloc_cbm" represents IO from devices it should be "hardware" bits
>>> (hw_shareable), no?
>>>
>> Yes. It is. But logic is bit different there.
>>
>> It loops thru all the CLOSes on the domain. So, it will print again like this below.
> 
> This is what current code does, but the code can be changed, no? For example, rdt_bit_usage_show()
> does not need to treat the IO allocation like all the other resource groups but instead handle it
> separately. Below us some pseudo code that presents the idea, untested, not compiled.
> 
> 	hw_shareable = r->cache.shareable_bits;
> 
> 	for (i = 0; i < closids_supported(); i++) {
> 		if (!closid_allocated(i) || 
> 		    (resctrl_arch_get_io_alloc_enabled(r) && i == resctrl_io_alloc_closid_get(r, s)))
> 			continue;
> 
> 		/*  Intitialize sw_shareable and exclusive */
> 	}
> 
> 	if (resctrl_arch_get_io_alloc_enabled(r)) {
> 		 /*
> 		  * Sidenote: I do not think schemata parameter is needed for
> 		  * resctrl_io_alloc_closid_get()

Sure. Got it.

> 		  */
> 		io_alloc_closid = resctrl_io_alloc_closid_get(r, s);
> 		if (resctrl_arch_get_cdp_enabled(r->rid))
> 			ctrl_val = resctrl_arch_get_config(r, dom, io_alloc_closid, CDP_CODE);
> 		else
> 			ctrl_val = resctrl_arch_get_config(r, dom, io_alloc_closid, CDP_NONE);
> 		hw_shareable |= ctrl_val;
> 	}
> 
> 	for (i = r->cache.cbm_len - 1; i >= 0; i--) {
> 		/* Write annotated bitmask to user space */
> 	}
> 	

Here is the behaviour after these cahnges.

=== Before io_alloc enabled==============================

#cd /sys/fs/resctrl/L3/
# cat io_alloc
disabled

# cat shareable_bits
0  (This is always 0 for AMD)

# cat bit_usage
0=SSSSSSSSSSSSSSSS;1=SSSSSSSSSSSSSSSS;2=SSSSSSSSSSSSSSSS;3=SSSSSSSSSSSSSSSS

==== After io_alloc enabled=================================

# echo 1 > io_alloc

# cat io_alloc
enabled

# cat io_alloc_cbm
L3:0=ffff;1=ffff;2=ffff;3=ffff

#cat bit_usage
0=XXXXXXXXXXXXXXXX;1=XXXXXXXXXXXXXXXX;2=XXXXXXXXXXXXXXXX;3=XXXXXXXXXXXXXXXX

==== After changing io_alloc_cbm ============================

#echo "L3:0=ff00;1=ff00;2=ff00;3=ff00 > io_alloc_cbm

# cat io_alloc_cbm
L3:0=ff00;1=ff00;2=ff00;3=ff00

#cat bit_usage
0=XXXXXXXXSSSSSSSS;1=XXXXXXXXSSSSSSSS;2=XXXXXXXXSSSSSSSS;3=XXXXXXXXSSSSSSSS
=============================================================

My concern here is, this may imply that the property is present across all
CLOSes in all the domains, while in fact, it only applies to a single
CLOS(15) within each domain.

Thanks
Babu Moger




[Index of Archives]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]     [Linux Resources]

  Powered by Linux