Hi Reinette, On 5/22/25 15:51, Reinette Chatre wrote: > Hi Babu, > > On 5/15/25 3:51 PM, Babu Moger wrote: >> Users can create as many monitor groups as RMIDs supported by the hardware. >> However, bandwidth monitoring feature on AMD system only guarantees that >> RMIDs currently assigned to a processor will be tracked by hardware. The >> counters of any other RMIDs which are no longer being tracked will be reset >> to zero. The MBM event counters return "Unavailable" for the RMIDs that are >> not tracked by hardware. So, there can be only limited number of groups >> that can give guaranteed monitoring numbers. With ever changing >> configurations there is no way to definitely know which of these groups are >> being tracked for certain point of time. Users do not have the option to >> monitor a group or set of groups for certain period of time without >> worrying about RMID being reset in between. >> >> The ABMC feature provides an option to the user to assign a hardware >> counter to an RMID, event pair and monitor the bandwidth as long as it is >> assigned. The assigned RMID will be tracked by the hardware until the user >> unassigns it manually. There is no need to worry about counters being reset >> during this period. Additionally, the user can specify a bitmask >> identifying the specific bandwidth types from the given source to track >> with the counter. >> >> Without ABMC enabled, monitoring will work in current mode without >> assignment option. >> >> The Linux resctrl subsystem provides an interface that allows monitoring of >> up to two memory bandwidth events per group, selected from a combination of >> available total and local events. When ABMC is enabled, two events will be >> assigned to each group by default, in line with the current interface >> design. Users will also have the option to configure which types of memory >> transactions are counted by these events. >> >> Due to the limited number of available counters (32), users may quickly >> exhaust the available counters. If the system runs out of assignable ABMC >> counters, the kernel will report an error. In such cases, users will nee >> dto unassign one or more active counters to free up countes for new > > "nee dto" -> "need to" > "countes" -> "counters" Sure. > >> assignments. The interface will provide options to assign or unassign > > "The interface will" -> "resctrl will"? > Sure. >> events through the group-specific interface file. >> >> The feature can be detected via CPUID_Fn80000020_EBX_x00 bit 5. > > "The feature can be detected" -> "The feature is detected" > Sure. >> Bits Description >> 5 ABMC (Assignable Bandwidth Monitoring Counters) >> >> The feature details are documented in APM listed below [1]. >> [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming >> Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth >> Monitoring (ABMC). >> >> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537 >> Signed-off-by: Babu Moger <babu.moger@xxxxxxx> >> --- > > ... >> arch/x86/include/asm/cpufeatures.h | 1 + >> arch/x86/kernel/cpu/cpuid-deps.c | 2 ++ >> arch/x86/kernel/cpu/scattered.c | 1 + >> 3 files changed, 4 insertions(+) >> >> diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h >> index 6c2c152d8a67..d5c14dc678df 100644 >> --- a/arch/x86/include/asm/cpufeatures.h >> +++ b/arch/x86/include/asm/cpufeatures.h >> @@ -481,6 +481,7 @@ >> #define X86_FEATURE_AMD_HETEROGENEOUS_CORES (21*32 + 6) /* Heterogeneous Core Topology */ >> #define X86_FEATURE_AMD_WORKLOAD_CLASS (21*32 + 7) /* Workload Classification */ >> #define X86_FEATURE_PREFER_YMM (21*32 + 8) /* Avoid ZMM registers due to downclocking */ >> +#define X86_FEATURE_ABMC (21*32 + 9) /* Assignable Bandwidth Monitoring Counters */ >> >> /* >> * BUG word(s) >> diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c >> index a2fbea0be535..2f54831e04e5 100644 >> --- a/arch/x86/kernel/cpu/cpuid-deps.c >> +++ b/arch/x86/kernel/cpu/cpuid-deps.c >> @@ -71,6 +71,8 @@ static const struct cpuid_dep cpuid_deps[] = { >> { X86_FEATURE_CQM_MBM_LOCAL, X86_FEATURE_CQM_LLC }, >> { X86_FEATURE_BMEC, X86_FEATURE_CQM_MBM_TOTAL }, >> { X86_FEATURE_BMEC, X86_FEATURE_CQM_MBM_LOCAL }, >> + { X86_FEATURE_ABMC, X86_FEATURE_CQM_MBM_TOTAL }, >> + { X86_FEATURE_ABMC, X86_FEATURE_CQM_MBM_LOCAL }, > > Is this dependency still accurate now that the implementation switched to the > "extended event ID" variant of ABMC that no longer uses the event IDs associated > with X86_FEATURE_CQM_MBM_TOTAL and X86_FEATURE_CQM_MBM_LOCAL? That's a good question. Unfortunately, we may need to retain this dependency for now, as a significant portion of the code relies on functions like resctrl_is_mbm_event(), resctrl_is_mbm_enabled(), resctrl_arch_is_mbm_total_enabled(), and others. > >> { X86_FEATURE_AVX512_BF16, X86_FEATURE_AVX512VL }, >> { X86_FEATURE_AVX512_FP16, X86_FEATURE_AVX512BW }, >> { X86_FEATURE_ENQCMD, X86_FEATURE_XSAVES }, >> diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c >> index 16f3ca30626a..3b72b72270f1 100644 >> --- a/arch/x86/kernel/cpu/scattered.c >> +++ b/arch/x86/kernel/cpu/scattered.c >> @@ -49,6 +49,7 @@ static const struct cpuid_bit cpuid_bits[] = { >> { X86_FEATURE_MBA, CPUID_EBX, 6, 0x80000008, 0 }, >> { X86_FEATURE_SMBA, CPUID_EBX, 2, 0x80000020, 0 }, >> { X86_FEATURE_BMEC, CPUID_EBX, 3, 0x80000020, 0 }, >> + { X86_FEATURE_ABMC, CPUID_EBX, 5, 0x80000020, 0 }, >> { X86_FEATURE_AMD_WORKLOAD_CLASS, CPUID_EAX, 22, 0x80000021, 0 }, >> { X86_FEATURE_PERFMON_V2, CPUID_EAX, 0, 0x80000022, 0 }, >> { X86_FEATURE_AMD_LBR_V2, CPUID_EAX, 1, 0x80000022, 0 }, > > Reinette > -- Thanks Babu Moger