Re: [PATCH v13 11/27] x86/resctrl: Implement resctrl_arch_config_cntr() to assign a counter with ABMC

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Tony,

On Fri, May 23, 2025 at 11:08 PM Luck, Tony <tony.luck@xxxxxxxxx> wrote:
>
> On Thu, May 22, 2025 at 10:16:16PM +0000, Luck, Tony wrote:
> > > It looks to me as though there are a couple of changes in the telemetry work
> > > that would benefit this work. https://lore.kernel.org/lkml/20250521225049.132551-2-tony.luck@xxxxxxxxx/
> > > switches the monitor events to be maintained in an array indexed by event ID, eliminating the
> > > need for searching the evt_list that this work does in a couple of places. Also note the handy
> > > new for_each_mbm_event() helper (https://lore.kernel.org/lkml/20250521225049.132551-5-tony.luck@xxxxxxxxx/).
> >
> > Yesterday I ran through the exercise of rebasing my AET patches on top of these
> > ABMC patches in order to check whether the ABMC patches painted resctrl
> > into some corner that would be hard to get back out of.
> >
> > Good news: they don't.
> >
> > There was a bunch of manual patching to make the first four patches fit on top
> > of the ABMC code, but I also noticed a few places where things were simpler
> > after combining the two series.
> >
> > Maybe a good path forward would be to take those first four patches from
> > my AET series and then build ABMC on top of those.
>
> As an encouragement to try this direction, I took my four patches
> on top of tip x86/cache and then applied Babu's ABMC series.

I did the same thing last week, except in the other order, so I
switched to your changes to test.

>
> Changes to Babu's code:
> 1) Adapt where needed for removal of evt_list. Use event array instead.
> 2) Use for_each_mbm_event() [Maybe didn't get all places?]
> 3) Bring the s/evt_val/evt_cfg/ fix into patch 20 from 21
> 4) Fix fir tree declaration for resctrl_process_assign()
>
> I don't have an AMD system to check if the ABMC parts still work. But
> it does pass the resctrl self tests, so legacy isn't broken.
>
> Patches in the "my_mbm_plus_babu_abmc" branch of my kernel.org
> repo: git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux.git

Thanks for applying my suggestion[1] about the array entry sizes, but
you needed one more dereference:

diff --git a/arch/x86/kernel/cpu/resctrl/core.c
b/arch/x86/kernel/cpu/resctrl/core.c
index 1db6a61e27746..0c27e0a5a7b96 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -399,7 +399,7 @@ static int domain_setup_ctrlval(struct
rdt_resource *r, struct rdt_ctrl_domain *
  */
 static int arch_domain_mbm_alloc(u32 num_rmid, struct
rdt_hw_mon_domain *hw_dom)
 {
-       size_t tsize = sizeof(hw_dom->arch_mbm_states[0]);
+       size_t tsize = sizeof(*hw_dom->arch_mbm_states[0]);
        enum resctrl_event_id evt;
        int idx;

diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index 098ff002d2232..44ec33cb165f7 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -4819,7 +4823,7 @@ void resctrl_offline_mon_domain(struct
rdt_resource *r, struct rdt_mon_domain *d
 static int domain_setup_mon_state(struct rdt_resource *r, struct
rdt_mon_domain *d)
 {
        u32 idx_limit = resctrl_arch_system_num_rmid_idx();
-       size_t tsize = sizeof(d->mbm_states[0]);
+       size_t tsize = sizeof(*d->mbm_states[0]);
        enum resctrl_event_id evt;
        int idx;


You should be able to repro an array overrun without ABMC, and a page
fault is likely if the system implements a lot of RMIDs. The AMD EPYC
9B45 I tested on implements 4096 RMIDs.

Thanks,
-Peter


[1] https://lore.kernel.org/lkml/CALPaoCj8yfzJ=5CkxTPQXc0-WRWpu0xKRX8v4FAWFGQKtXtMUw@xxxxxxxxxxxxxx/





[Index of Archives]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]     [Linux Resources]

  Powered by Linux