Re: [PATCH v6 mm-new 00/10] mm, bpf: BPF based THP order selection

Yafang Shao <laoar.shao@xxxxxxxxx> · Tue, 26 Aug 2025 20:03:13 +0800

On Tue, Aug 26, 2025 at 3:42 PM David Hildenbrand <david@xxxxxxxxxx> wrote:
>
> On 26.08.25 09:19, Yafang Shao wrote:
> > Background
> > ==========
> >
> > Our production servers consistently configure THP to "never" due to
> > historical incidents caused by its behavior. Key issues include:
> > - Increased Memory Consumption
> >    THP significantly raises overall memory usage, reducing available memory
> >    for workloads.
> >
> > - Latency Spikes
> >    Random latency spikes occur due to frequent memory compaction triggered
> >    by THP.
> >
> > - Lack of Fine-Grained Control
> >    THP tuning is globally configured, making it unsuitable for containerized
> >    environments. When multiple workloads share a host, enabling THP without
> >    per-workload control leads to unpredictable behavior.
> >
> > Due to these issues, administrators avoid switching to madvise or always
> > modes—unless per-workload THP control is implemented.
> >
> > To address this, we propose BPF-based THP policy for flexible adjustment.
> > Additionally, as David mentioned [0], this mechanism can also serve as a
> > policy prototyping tool (test policies via BPF before upstreaming them).
>
> There is a lot going on and most reviewers (including me) are fairly
> busy right now, so getting more detailed review could take a while.
>
> This topic sounds like a good candidate for the bi-weekly MM alignment
> session.
>
> Would you be interested in presenting the current bpf interface, how to
> use it,  drawbacks, todos, ... in that forum?

Sure.

>
> David Rientjes, who organizes this meeting, is already on Cc.

DavidR had previously reached out to me about this patchset.

Hello DavidR,

Would September 17 from 9:00–10:00 AM PDT (UTC-7) work for discussing
this topic? If that time isn’t convenient, I’m happy to schedule a
later session—this will also give me some time to prepare a brief
slide.

On a related note, I’d like to take this opportunity to also share a
short proposal on BPF-based NUMA balancing.

On our AMD EPYC servers, many services experience significant
performance degradation due to cross-NUMA access. While NUMA balancing
can help mitigate this, its current global enable/disable
implementation often leads to overall system performance regression.
We are exploring the use of BPF to selectively enable NUMA balancing
only for NUMA-sensitive services, thereby minimizing unintended side
effects. A similar approach has been proposed in [0] using prctl() or
a cgroup interface. We believe this use case is particularly
well-suited for a BPF-based solution, and I’ll briefly outline why in
the slide. I’ve included the developers from [0] in CC for visibility,
in case they are interested in joining the discussion.

Looking forward to your thoughts.

[0]. https://lore.kernel.org/lkml/20250625102337.3128193-1-yu.c.chen@xxxxxxxxx/

-- 
Regards
Yafang