On Tue, May 20, 2025 at 5:49 PM Lorenzo Stoakes <lorenzo.stoakes@xxxxxxxxxx> wrote: > > On Tue, May 20, 2025 at 11:43:11AM +0200, David Hildenbrand wrote: > > > Conclusion > > > ---------- > > > > > > Introducing a new "bpf" mode for BPF-based per-task THP adjustments is the > > > most effective solution for our requirements. This approach represents a > > > small but meaningful step toward making THP truly usable—and manageable—in > > > production environments. > > A new "bpf" mode sounds way too special. > > > > We currently have: > > > > never -> never > > madvise -> MADV_HUGEPAGE, except PR_SET_THP_DISABLE > > always -> always, except PR_SET_THP_DISABLE and MADV_NOHUGEPAGE > > > > Whatever new mode we add, it should honor PR_SET_THP_DISABLE + > > MADV_NOHUGEPAGE. > > > > So, if we want another way to enable things, it would live between "never" > > and "madvise". > > > > I'm wondering how we could make that generic: likely we want this new > > mechanism to *not* be triggerable by the process itself (madvise). > > > > I am not convinced bpf is the answer here ... > > Agreed. > > I am also very concerned with us inserting BPF bits here - are we not then > ensuring that we cannot in any way move towards a future where we > 'automagically' determine what to do? > > I don't know what is claimed about BPF, but it strikes me that we're > establishing a permanent uABI (uAPI?) if we do that and essentially > promising that THP will continue to operate in a fashion similar to how it > does now. > > While BPF is a wonderful technology, I thik we have to be very very careful > about inserting it in places that consist of -implementation details- that > we in mm already are planning to move away from. > > It's one thing adding BPF in the oomk (simple interface, unlikely to > change, doesn't really constrain us) or the scheduler (again the hooks are > by nature reasonably stable), it's quite another sticking it in the heart > of a part of mm that is undergoing _constant_ change, partly as evidenced > by the sheer number of series related to THP that are currently on-list. > > So while BPF may be the best solution for your needs _right now_, we need > be concerned with how things affect the kernel in the future. > > I think we really do have to tread very carefully here. I totally agree with you that the key point here is how to define the API. As I replied to David, I believe we have two fundamental principles to adjust the THP policies: 1. Selective Benefit: Some tasks benefit from THP, while others do not. 2. Conditional Safety: THP allocation is safe under certain conditions but not others. Therefore, I believe we can define these APIs based on the established principles - everything else constitutes implementation details, even if core MM internals need to change. -- Regards Yafang