Re: [PATCH v13 00/27] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)

Reinette Chatre <reinette.chatre@xxxxxxxxx> · Thu, 22 May 2025 09:32:42 -0700

Hi Peter,

On 5/22/25 1:47 AM, Peter Newman wrote:
> Hi Tony, Reinette,
> 
> On Thu, May 22, 2025 at 2:21 AM Luck, Tony <tony.luck@xxxxxxxxx> wrote:
>>
>>>>>> There's also the mongroup-RMID overcommit use case I described
>>>>>> above[1]. On Intel we can safely assume that there are counters to
>>>>>> back all RMIDs, so num_mbm_cntrs would be calculated directly from
>>>>>> num_rmids.
>>>>>
>>>>> This is about the:
>>>>>    There's now more interest in Google for allowing explicit control of
>>>>>    where RMIDs are assigned on Intel platforms. Even though the number of
>>>>>    RMIDs implemented by hardware tends to be roughly the number of
>>>>>    containers they want to support, they often still need to create
>>>>>    containers when all RMIDs have already been allocated, which is not
>>>>>    currently allowed. Once the container has been created and starts
>>>>>    running, it's no longer possible to move its threads into a monitoring
>>>>>    group whenever RMIDs should become available again, so it's important
>>>>>    for resctrl to maintain an accurate task list for a container even
>>>>>    when RMIDs are not available.
>>>>>
>>>>> I see a monitor group as a collection of tasks that need to be monitored together.
>>>>> The "task list" is the group of tasks that share a monitoring ID that
>>>>> is required to be a valid ID since when any of the tasks are scheduled that ID is
>>>>> written to the hardware. I intentionally tried to not use RMID since I believe
>>>>> this is required for all archs.
>>>>> I thus do not understand how a task can start running when it does not have
>>>>> a valid monitoring ID. The idea of "deferred assignment" is not clear to me,
>>>>> there can never be "unmonitored tasks", no? I think I am missing something here.
> 
> You are correct. I did forget to mention something...
> 
>>>>
>>>> In the AMD/RMID implemenentation this might be achieved with something
>>>> extra in the task structure to denote whether a task is in a monitored
>>>> group or not. E.g. We add "task->rmid_valid" as well as "task->rmid".
>>>> Tasks in an unmonitored group retain their "task->rmid" (that's what
>>>> identifies them as a member of a group) but have task->rmid_valid set
>>>> to false.  Context switch code would be updated to load "0" into the
>>>> IA32_PQR_ASSOC.RMID field for tasks without a valid RMID. So they
>>>> would still be monitored, but activity would be bundled with all
>>>> tasks in the default resctrl group.
>>>>
>>>> Presumably something analogous could be done for ARM/MPAM.
>>>>
>>>
>>> I do not interpret this as an unmonitored task but instead a task that
>>> belongs to the default resource group. Specifically, any data accumulated by
>>> such a task is attributed to the default resource group. Having tasks
>>> in a separate group but their monitoring data accumulating in/contributed to
>>> the default resource group (that has its own set of tasks) sounds wrong to me.
>>> Such an implementation makes any monitoring data of default resource group
>>> invalid, and by extension impossible to use default resource group to manage
>>> an allocation for a group of monitor groups if user space needs insight
>>> in monitoring data across all these monitor groups. User space will need to
>>> interact with resctrl differently and individually query monitor groups instead
>>> of CTRL_MON group once.
>>
>> Maybe assign one of the limited supply of RMIDs for these "unmonitored"
>> tasks. Populate a resctrl group named "unmonitored" that lists all the
>> unmonitored tasks in a (read-only) "tasks" file. And supply all the counts
>> for these tasks in normal looking "mon_data" directory.
> 
> I needed to switch to an rdtgroup struct pointer rather than hardware
> IDs in the task structure to indicate group membership[1], otherwise
> it's not possible to determine which tasks are in a group when it
> doesn't have a unique HW ID value.

Whether the task struct contains a pointer (albeit accompanied with its
own complexities) does not address the issue that I am concerned about.

Looking at [1] I expect this new feature handles "unmonitored" groups by
placing them in the default monitoring group, following Tony's first [3]
suggestion.

When considering [1] by itself in the context of current resctrl all tasks
should be members of resource groups that have valid HW monitoring IDs allocated.
Using the default resource group in this way seems like addressing edge cases
where pointer is not yet valid (unclear what these scenarios may be) instead of
routing many tasks to the default group. I am not sure and I'll have to study
that change closer to reason accurately.