Hi Tony, On Fri, May 23, 2025 at 11:08 PM Luck, Tony <tony.luck@xxxxxxxxx> wrote: > > On Thu, May 22, 2025 at 10:16:16PM +0000, Luck, Tony wrote: > > > It looks to me as though there are a couple of changes in the telemetry work > > > that would benefit this work. https://lore.kernel.org/lkml/20250521225049.132551-2-tony.luck@xxxxxxxxx/ > > > switches the monitor events to be maintained in an array indexed by event ID, eliminating the > > > need for searching the evt_list that this work does in a couple of places. Also note the handy > > > new for_each_mbm_event() helper (https://lore.kernel.org/lkml/20250521225049.132551-5-tony.luck@xxxxxxxxx/). > > > > Yesterday I ran through the exercise of rebasing my AET patches on top of these > > ABMC patches in order to check whether the ABMC patches painted resctrl > > into some corner that would be hard to get back out of. > > > > Good news: they don't. > > > > There was a bunch of manual patching to make the first four patches fit on top > > of the ABMC code, but I also noticed a few places where things were simpler > > after combining the two series. > > > > Maybe a good path forward would be to take those first four patches from > > my AET series and then build ABMC on top of those. > > As an encouragement to try this direction, I took my four patches > on top of tip x86/cache and then applied Babu's ABMC series. I did the same thing last week, except in the other order, so I switched to your changes to test. > > Changes to Babu's code: > 1) Adapt where needed for removal of evt_list. Use event array instead. > 2) Use for_each_mbm_event() [Maybe didn't get all places?] > 3) Bring the s/evt_val/evt_cfg/ fix into patch 20 from 21 > 4) Fix fir tree declaration for resctrl_process_assign() > > I don't have an AMD system to check if the ABMC parts still work. But > it does pass the resctrl self tests, so legacy isn't broken. > > Patches in the "my_mbm_plus_babu_abmc" branch of my kernel.org > repo: git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux.git Thanks for applying my suggestion[1] about the array entry sizes, but you needed one more dereference: diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c index 1db6a61e27746..0c27e0a5a7b96 100644 --- a/arch/x86/kernel/cpu/resctrl/core.c +++ b/arch/x86/kernel/cpu/resctrl/core.c @@ -399,7 +399,7 @@ static int domain_setup_ctrlval(struct rdt_resource *r, struct rdt_ctrl_domain * */ static int arch_domain_mbm_alloc(u32 num_rmid, struct rdt_hw_mon_domain *hw_dom) { - size_t tsize = sizeof(hw_dom->arch_mbm_states[0]); + size_t tsize = sizeof(*hw_dom->arch_mbm_states[0]); enum resctrl_event_id evt; int idx; diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c index 098ff002d2232..44ec33cb165f7 100644 --- a/fs/resctrl/rdtgroup.c +++ b/fs/resctrl/rdtgroup.c @@ -4819,7 +4823,7 @@ void resctrl_offline_mon_domain(struct rdt_resource *r, struct rdt_mon_domain *d static int domain_setup_mon_state(struct rdt_resource *r, struct rdt_mon_domain *d) { u32 idx_limit = resctrl_arch_system_num_rmid_idx(); - size_t tsize = sizeof(d->mbm_states[0]); + size_t tsize = sizeof(*d->mbm_states[0]); enum resctrl_event_id evt; int idx; You should be able to repro an array overrun without ABMC, and a page fault is likely if the system implements a lot of RMIDs. The AMD EPYC 9B45 I tested on implements 4096 RMIDs. Thanks, -Peter [1] https://lore.kernel.org/lkml/CALPaoCj8yfzJ=5CkxTPQXc0-WRWpu0xKRX8v4FAWFGQKtXtMUw@xxxxxxxxxxxxxx/