Hi Reinette, On 5/5/25 11:22, Reinette Chatre wrote: > Hi Babu, > > On 5/2/25 5:53 PM, Moger, Babu wrote: >> Hi Reinette, >> >> Thanks for quick turnaround. >> >> On 5/2/2025 4:20 PM, Reinette Chatre wrote: >>> Hi Babu, >>> >>> On 4/21/25 3:43 PM, Babu Moger wrote: >>>> # Linux Implementation >>>> >>>> Feature adds following interface files when the resctrl "io_alloc" feature is >>>> supported on L3 resource: >>>> >>>> /sys/fs/resctrl/info/L3/io_alloc: Report the feature status. Enable/disable the >>>> feature by writing to the interface. >>>> >>>> /sys/fs/resctrl/info/L3/io_alloc_cbm: List the Capacity Bit Masks (CBMs) available >>>> for I/O devices when io_alloc feature is enabled. >>>> Configure the CBM by writing to the interface. >>>> >>>> # Examples: >>>> >>>> a. Check if io_alloc feature is available >>>> #mount -t resctrl resctrl /sys/fs/resctrl/ >>>> >>>> # cat /sys/fs/resctrl/info/L3/io_alloc >>>> disabled >>>> >>>> b. Enable the io_alloc feature. >>>> >>>> # echo 1 > /sys/fs/resctrl/info/L3/io_alloc >>>> # cat /sys/fs/resctrl/info/L3/io_alloc >>>> enabled >>>> >>>> c. Check the CBM values for the io_alloc feature. >>>> >>>> # cat /sys/fs/resctrl/info/L3/io_alloc_cbm >>>> L3:0=ffff;1=ffff >>>> >>>> d. Change the CBM value for the domain 1: >>>> # echo L3:1=FF > /sys/fs/resctrl/info/L3/io_alloc_cbm >>>> >>>> # cat /sys/fs/resctrl/info/L3/io_alloc_cbm >>>> L3:0=ffff;1=00ff >>>> >>>> d. Disable io_alloc feature and exit. >>>> >>>> # echo 0 > /sys/fs/resctrl/info/L3/io_alloc >>>> # cat /sys/fs/resctrl/info/L3/io_alloc >>>> disabled >>>> >>>> #umount /sys/fs/resctrl/ >>>> >>> >>>> From what I can tell the interface when CDP is enabled will look >>> as follows: >>> >>> # mount -o cdp -t resctrl resctrl /sys/fs/resctrl/ >>> # cat /sys/fs/resctrl/info/L3CODE/io_alloc >>> disabled >>> # cat /sys/fs/resctrl/info/L3DATA/io_alloc >>> not supported >>> "io_alloc" can thus be enabled for L3CODE but not for L3DATA. >>> This is unexpected considering the feature is called >>> "L3 Smart *Data* Cache Injection Allocation Enforcement". >>> >>> I understand that the interface evolved into this because the >>> "code" allocation of CDP uses the CLOSID required by SDCIAE but I think >>> leaking implementation details like this to the user interface can >>> cause confusion. >>> >>> Since there is no distinction between code and data in these >>> IO allocations, what do you think of connecting the io_alloc and >>> io_alloc_cbm files within L3CODE and L3DATA so that the user can >>> read/write from either with a read showing the same data and >>> user able to write to either? For example, >>> >>> # mount -o cdp -t resctrl resctrl /sys/fs/resctrl/ >>> # cat /sys/fs/resctrl/info/L3CODE/io_alloc >>> disabled >>> # cat /sys/fs/resctrl/info/L3DATA/io_alloc >>> disabled >>> # echo 1 > /sys/fs/resctrl/info/L3CODE/io_alloc >>> # cat /sys/fs/resctrl/info/L3CODE/io_alloc >>> enabled >>> # cat /sys/fs/resctrl/info/L3DATA/io_alloc >>> enabled >>> # cat /sys/fs/resctrl/info/L3DATA/io_alloc_cbm >>> 0=ffff;1=ffff >>> # cat /sys/fs/resctrl/info/L3CODE/io_alloc_cbm >>> 0=ffff;1=ffff >>> # echo 1=FF > /sys/fs/resctrl/info/L3DATA/io_alloc_cbm >>> # cat /sys/fs/resctrl/info/L3DATA/io_alloc_cbm >>> 0=ffff;1=00ff >>> # cat /sys/fs/resctrl/info/L3CODE/io_alloc_cbm >>> 0=ffff;1=00ff >> >> I agree. There is no right or wrong here. It can be done this way like you mentioned above. But I am not sure if will clear the confusion. >> >> We have already added the text in user doc (also spec says the same). >> >> "On AMD systems, the io_alloc feature is supported by the L3 Smart >> Data Cache Injection Allocation Enforcement (SDCIAE). The CLOSID for >> io_alloc is determined by the highest CLOSID supported by the resource. >> When CDP is enabled, io_alloc routes I/O traffic using the highest >> CLOSID allocated for the instruction cache (L3CODE). >> >> Dont you think this text might clear the confusion? We can add examples also if that makes it even more clear. > > The user interface is not intended to be a mirror of the hardware interface. > If it was, doing so is becoming increasingly difficult with multiple > architectures with different hardware intefaces needing to use the same > user interface for control. Remember, there are no "CLOSID" in MPAM and > I do not know details of what RISC-V brings. > > We should aim to have something as generic as possible that makes sense > for user space. All the hardware interface details should be hidden as much > as possible from user interface. When we expose the hardware interface details > it becomes very difficult to support new use cases. > > The only aspect of "closids" that has been exposed to user space thus far > is the "num_closids" and in user documentation a CLOSid has been linked to the > number of control groups. That is the only constraint we need to think about > here. I have repeatedly asked for IO alloc connection with CLOSIDs to not be exposed > to user space (yet user documentation and messages to user space keeps doing so > in this series). Support for IO alloc in this way is unique to AMD. We do not want > resctrl to be constrained like this if another architecture needs to support > some form of IO alloc and does so in a different way. > > I understand that IO alloc backed by CLOSID is forming part of resctrl fs in this > implementation and that is ok for now. As long as we do not leak this to user space > it gives use flexibility to change resctrl fs when/if we learn different architecture > needs later. That makes sense. I’ll go ahead and adjust it as suggested. > >>> (Note in above I removed the resource name from io_alloc_cbm to match >>> what was discussed during previous version: >>> https://lore.kernel.org/lkml/251c8fe1-603f-4993-a822-afb35b49cdfa@xxxxxxx/ ) >>> What do you think? >> >> Yes. I remember. "Kept the resource name while printing the CBM for io_alloc, so we dont have to change show_doms() just for this feature and it is consistant across all the schemata display. > > It almost sounds like you do not want to implement something because the > code to support it does not exist? > >> >> I added the note in here. >> https://lore.kernel.org/lkml/784fbc61e02e9a834473c3476ee196ef6a44e338.1745275431.git.babu.moger@xxxxxxx/ > > You mention "I dont have to change show_doms() just for this feature and it is > consistant across all the schemata display." > I am indeed seeing a pattern where one goal is to add changes by changing minimum > amount of code. Please let this not be a goal but instead make it a goal to integrate > changes into resctrl appropriately, not just pasted on top. > > When it comes to the schemata display then it makes sense to add the resource name since > the schemata file is within a resource group containing multiple resources and the schemata > file thus needs to identify resources. Compare this to, for example, the "bit_usage" file > that is unique to a resource and thus no need to identify the resource. > >> >> I will change it if you feel strongly about it. We will have to change show_doms() to handle this. > > What is the problem with changing show_doms()? There is no problem changing show_doms(). My intenstion was to keep the change as minimul as possible. Sure. Will make the changes "not" to print the resource name for io_alloc_cbm. > >> >>> >>> >>>> --- >>>> v4: The "io_alloc" interface will report "enabled/disabled/not supported" >>>> instead of 0 or 1.. >>>> >>>> Updated resctrl_io_alloc_closid_get() to verify the max closid availability >>>> using closids_supported(). >>>> >>>> Updated the documentation for "shareable_bits" and "bit_usage". >>>> >>>> NOTE: io_alloc is about specific CLOS. rdt_bit_usage_show() is not designed >>>> handle bit_usage for specific CLOS. Its about overall system. So, we cannot >>>> really tell the user which CLOS is shared across both hardware and software. >>> >>> "bit_usage" is not about CLOS but how the resource is used. Per the doc: >>> >>> "bit_usage": >>> Annotated capacity bitmasks showing how all >>> instances of the resource are used. >>> >>> The key here is the CBM, not CLOS. For each bit in the *CBM* "bit_usage" shows >>> how that portion of the cache is used with the legend documented in >>> Documentation/arch/x86/resctrl.rst. >>> >>> Consider a system with the following allocations: >>> # cat /sys/fs/resctrl/schemata >>> L3:0=0ff0 >> >> This is CLOS 0. >> >>> # cat /sys/fs/resctrl/info/L3/io_alloc_cbm >>> 0=ff00 >> >> This is CLOS 15. >> >>> >>> Then "bit_usage" will look like: >>> >>> # cat /sys/fs/resctrl/info/L3/bit_usage >>> 0=HHHHXXXXSSSS0000 >> >> It is confusing here. To make it clear we may have to print all the CLOSes in each domain. > > Could you please elaborate how this is confusing? # cat /sys/fs/resctrl/info/L3/bit_usage 0=HHHHXXXXSSSS0000 This may give the impression that the all CLOSes in all domains carries this property, but in reality, it applies only to one CLOS(15) within each domain. Example below.... > >> >> # cat /sys/fs/resctrl/info/L3/bit_usage >> DOM0=CLOS0:SSSSSSSSSSSSSSSS;... ;CLOS15=HHHHXXXXSSSS0000; >> DOM1=CLOS0:SSSSSSSSSSSSSSSS;... ;CLOS15=HHHHXXXXSSSS0000 > > Please no. Not just does this change existing user interface it also breaks the goal of > "bit_usage". > > Please think of it from user perspective. If user wants to know, for example, "how is my > L3 cache allocated" then the "bit_usage" file provides that summary. > >>> "bit_usage" shows how the cache is being used. It shows that the portion of cache represented >>> by first four bits of CBM is unused, portion of cache represented by bits 4 to 7 of CBM is >>> only used by software, portion of cache represented by bits 8 to 11 of CBM is shared between >>> software and hardware, portion of cache represented by bits 12 to 15 is only used by hardware. >>> >>>> This is something we need to discuss. >>> >>> Looking at implementation in patch #5 the "io_alloc_cbm" bits of CBM are presented >>> as software bits, since "io_alloc_cbm" represents IO from devices it should be "hardware" bits >>> (hw_shareable), no? >>> >> Yes. It is. But logic is bit different there. >> >> It loops thru all the CLOSes on the domain. So, it will print again like this below. > > This is what current code does, but the code can be changed, no? For example, rdt_bit_usage_show() > does not need to treat the IO allocation like all the other resource groups but instead handle it > separately. Below us some pseudo code that presents the idea, untested, not compiled. > > hw_shareable = r->cache.shareable_bits; > > for (i = 0; i < closids_supported(); i++) { > if (!closid_allocated(i) || > (resctrl_arch_get_io_alloc_enabled(r) && i == resctrl_io_alloc_closid_get(r, s))) > continue; > > /* Intitialize sw_shareable and exclusive */ > } > > if (resctrl_arch_get_io_alloc_enabled(r)) { > /* > * Sidenote: I do not think schemata parameter is needed for > * resctrl_io_alloc_closid_get() Sure. Got it. > */ > io_alloc_closid = resctrl_io_alloc_closid_get(r, s); > if (resctrl_arch_get_cdp_enabled(r->rid)) > ctrl_val = resctrl_arch_get_config(r, dom, io_alloc_closid, CDP_CODE); > else > ctrl_val = resctrl_arch_get_config(r, dom, io_alloc_closid, CDP_NONE); > hw_shareable |= ctrl_val; > } > > for (i = r->cache.cbm_len - 1; i >= 0; i--) { > /* Write annotated bitmask to user space */ > } > Here is the behaviour after these cahnges. === Before io_alloc enabled============================== #cd /sys/fs/resctrl/L3/ # cat io_alloc disabled # cat shareable_bits 0 (This is always 0 for AMD) # cat bit_usage 0=SSSSSSSSSSSSSSSS;1=SSSSSSSSSSSSSSSS;2=SSSSSSSSSSSSSSSS;3=SSSSSSSSSSSSSSSS ==== After io_alloc enabled================================= # echo 1 > io_alloc # cat io_alloc enabled # cat io_alloc_cbm L3:0=ffff;1=ffff;2=ffff;3=ffff #cat bit_usage 0=XXXXXXXXXXXXXXXX;1=XXXXXXXXXXXXXXXX;2=XXXXXXXXXXXXXXXX;3=XXXXXXXXXXXXXXXX ==== After changing io_alloc_cbm ============================ #echo "L3:0=ff00;1=ff00;2=ff00;3=ff00 > io_alloc_cbm # cat io_alloc_cbm L3:0=ff00;1=ff00;2=ff00;3=ff00 #cat bit_usage 0=XXXXXXXXSSSSSSSS;1=XXXXXXXXSSSSSSSS;2=XXXXXXXXSSSSSSSS;3=XXXXXXXXSSSSSSSS ============================================================= My concern here is, this may imply that the property is present across all CLOSes in all the domains, while in fact, it only applies to a single CLOS(15) within each domain. Thanks Babu Moger