Hi Reinette, On Wed, May 21, 2025 at 1:44 AM Reinette Chatre <reinette.chatre@xxxxxxxxx> wrote: > > Hi Babu, > > On 5/20/25 4:25 PM, Moger, Babu wrote: > > Hi Reinette, > > > > On 5/20/2025 1:23 PM, Reinette Chatre wrote: > >> Hi Babu, > >> > >> On 5/20/25 10:51 AM, Moger, Babu wrote: > >>> Hi Reinette, > >>> > >>> On 5/20/25 11:06, Reinette Chatre wrote: > >>>> Hi Babu, > >>>> > >>>> On 5/20/25 8:28 AM, Moger, Babu wrote: > >>>>> On 5/19/25 10:59, Peter Newman wrote: > >>>>>> On Fri, May 16, 2025 at 12:52 AM Babu Moger <babu.moger@xxxxxxx> wrote: > >>>> > >>>> ... > >>>> > >>>>>>> /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs: Reports the number of monitoring > >>>>>>> counters available for assignment. > >>>>>> > >>>>>> Earlier I discussed with Reinette[1] what num_mbm_cntrs should > >>>>>> represent in a "soft-ABMC" implementation where assignment is > >>>>>> implemented by assigning an RMID, which would result in all events > >>>>>> being assigned at once. > >>>>>> > >>>>>> My main concern is how many "counters" you can assign by assigning > >>>>>> RMIDs. I recall Reinette proposed reporting the number of groups which > >>>>>> can be assigned separately from counters which can be assigned. > >>>>> > >>>>> More context may be needed here. Currently, num_mbm_cntrs indicates the > >>>>> number of counters available per domain, which is 32. > >>>>> > >>>>> At the moment, we can assign 2 counters to each group, meaning each RMID > >>>>> can be associated with 2 hardware counters. In theory, it's possible to > >>>>> assign all 32 hardware counters to a group—allowing one RMID to be linked > >>>>> with up to 32 counters. However, we currently lack the interface to > >>>>> support that level of assignment. > >>>>> > >>>>> For now, the plan is to support basic assignment and expand functionality > >>>>> later once we have the necessary data structure and requirements. > >>>> > >>>> Looks like some requirements did not make it into this implementation. > >>>> Do you recall the discussion that resulted in you writing [2]? Looks like > >>>> there is a question to Peter in there on how to determine how many "counters" > >>>> are available in soft-ABMC. I interpreted [3] at that time to mean that this > >>>> information would be available in a future AMD publication. > >>> > >>> We already have a method to determine the number of counters in soft-ABMC > >>> mode, which Peter has addressed [4]. > >>> > >>> [4] > >>> https://lore.kernel.org/lkml/20250203132642.2746754-1-peternewman@xxxxxxxxxx/ > >>> > >>> This appears to be more of a workaround, and I doubt it will be included > >>> in any official AMD documentation. Additionally, the long-term direction > >>> is moving towards ABMC. > >>> > >>> I don’t believe this workaround needs to be part of the current series. It > >>> can be added later when soft-ABMC is implemented. > >> > >> Agreed. What about the plans described in [2]? (Thanks to Peter for > >> catching this!). > >> > >> It is important to keep track of requirements while working on a feature to > >> ensure that the implementation supports the planned use cases. Re-reading that > >> thread it is not clear to me how soft-ABMC's per-group assignment would look. > >> Could you please share how you see it progress from this implementation? > >> This includes the single event vs. multiple event assignment. I would like to > >> highlight that this is not a request for this to be supported in this implementation > >> but there needs to be a plan for how this can be supported on top of interfaces > >> established by this work. > >> > > > > Here’s my current understanding of soft-ABMC. Peter may have a more in-depth perspective on this. > > > > Soft-ABMC: > > a. num_mbm_cntrs: This is a software-defined limit based on the number of active RMIDs that can be supported. The value can be obtained using the code referenced in [4]. > > > > b. Assignments: No hardware configuration is required. We simply need to ensure that no more than num_mbm_cntrs RMIDs are active at any given time. > > > > c. Configuration: Controlled via /info/L3_MON/mbm_total_bytes_config and mbm_local_bytes_config. > > > > d. Events: Only two events can be assigned(local and total). > > > > ABMC: > > a. num_mbm_cntrs: This is defined by the hardware. > > b. Assignments: Requires special MSR writes to assign counters. > > c. Configuration: Comes from /info/L3_MON/counter_configs/. > > d. Events: More than two events can be assigned to a group (currently up to 2). > > > > Commonalities: > > a. Assignments can be either exclusive or shared in both these modes. > > > > Given these, I believe we can easily accommodate soft-ABMC in this interface. > > This is not so obvious to me. It looks to me as though the user is forced to interpret > the content of resctrl files differently based on soft-ABMC vs ABMC making the interface > inconsistent and user thus needing to know details of implementations. This is what the previous > discussion I linked to aimed to address. It sounds to me as though you believe that this is no longer > an issue. Could you please show examples of what a user can expect from the interfaces and how a user > will interact with the interfaces on both a non-ABMC and ABMC system? At the interface level, I think mbm_L3_assignments on a non-ABMC system would only need to contain a single line: 0=s;1=s;...;31=s But maybe for consistency we would synthesize a single, unmodifiable counter configuration to reflect that allocating an RMID in a domain results in assignment to all events and deallocating the RMID unassigns all events. We could call it "group" to say it's assigning at the group level, or perhaps just '*': *:0=s;1=s;...;31=s I'm not sure about allowing a '*' on ABMC hardware, because it could be interpreted as allocating a lot of counters when a large number of event configurations exist. *:0=s;1=s;...;31=s -Peter > > Thank you > > Reinette > > > > >>>> > >>>> [2] https://lore.kernel.org/lkml/afb99efe-0de2-f7ad-d0b8-f2a0ea998efd@xxxxxxx/ > >>>> [3] https://lore.kernel.org/lkml/CALPaoCg3KpF94g2MEmfP_Ro2mQZYFA8sKVkmb+7isotKNgdY9A@xxxxxxxxxxxxxx/ > >>> > >> > >> > > >