On Tue, Aug 26, 2025 at 3:42 PM David Hildenbrand <david@xxxxxxxxxx> wrote: > > On 26.08.25 09:19, Yafang Shao wrote: > > Background > > ========== > > > > Our production servers consistently configure THP to "never" due to > > historical incidents caused by its behavior. Key issues include: > > - Increased Memory Consumption > > THP significantly raises overall memory usage, reducing available memory > > for workloads. > > > > - Latency Spikes > > Random latency spikes occur due to frequent memory compaction triggered > > by THP. > > > > - Lack of Fine-Grained Control > > THP tuning is globally configured, making it unsuitable for containerized > > environments. When multiple workloads share a host, enabling THP without > > per-workload control leads to unpredictable behavior. > > > > Due to these issues, administrators avoid switching to madvise or always > > modes—unless per-workload THP control is implemented. > > > > To address this, we propose BPF-based THP policy for flexible adjustment. > > Additionally, as David mentioned [0], this mechanism can also serve as a > > policy prototyping tool (test policies via BPF before upstreaming them). > > There is a lot going on and most reviewers (including me) are fairly > busy right now, so getting more detailed review could take a while. > > This topic sounds like a good candidate for the bi-weekly MM alignment > session. > > Would you be interested in presenting the current bpf interface, how to > use it, drawbacks, todos, ... in that forum? Sure. > > David Rientjes, who organizes this meeting, is already on Cc. DavidR had previously reached out to me about this patchset. Hello DavidR, Would September 17 from 9:00–10:00 AM PDT (UTC-7) work for discussing this topic? If that time isn’t convenient, I’m happy to schedule a later session—this will also give me some time to prepare a brief slide. On a related note, I’d like to take this opportunity to also share a short proposal on BPF-based NUMA balancing. On our AMD EPYC servers, many services experience significant performance degradation due to cross-NUMA access. While NUMA balancing can help mitigate this, its current global enable/disable implementation often leads to overall system performance regression. We are exploring the use of BPF to selectively enable NUMA balancing only for NUMA-sensitive services, thereby minimizing unintended side effects. A similar approach has been proposed in [0] using prctl() or a cgroup interface. We believe this use case is particularly well-suited for a BPF-based solution, and I’ll briefly outline why in the slide. I’ve included the developers from [0] in CC for visibility, in case they are interested in joining the discussion. Looking forward to your thoughts. [0]. https://lore.kernel.org/lkml/20250625102337.3128193-1-yu.c.chen@xxxxxxxxx/ -- Regards Yafang