Re: [RFC PATCH v2 0/5] mm, bpf: BPF based THP adjustment

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, May 20, 2025 at 5:44 PM David Hildenbrand <david@xxxxxxxxxx> wrote:
>
> > Conclusion
> > ----------
> >
> > Introducing a new "bpf" mode for BPF-based per-task THP adjustments is the
> > most effective solution for our requirements. This approach represents a
> > small but meaningful step toward making THP truly usable—and manageable—in
> > production environments.
> A new "bpf" mode sounds way too special.

Alternatively, we could simply hook 'madvise' to define a BPF-based policy.

>
> We currently have:
>
> never -> never
> madvise -> MADV_HUGEPAGE, except PR_SET_THP_DISABLE
> always -> always, except PR_SET_THP_DISABLE and MADV_NOHUGEPAGE

If BPF had been invented before THP, we likely would have only three
modes—without PR_SET_THP_DISABLE, MADV_NOHUGEPAGE, or MADV_HUGEPAGE;-)

never -> never
user -> user defined per task or per vma THP mode selector, based on BPF
            We can select "never" or "always" for a specific task or vma
            The API is as follows,
            bpf->per_task_mode_selector(task);
            bpf->per_vma_mode_selecor(vma);
always -> always

However, it’s not too late to introduce a new BPF-based mode for THP,
especially since future adjustments to THP policies are still
expected. Regardless of the specific policy, two fundamental
principles apply:
1. Selective Benefit: Some tasks benefit from THP, while others do not.
2. Conditional Safety: THP allocation is safe under certain conditions
but not others.

Given these constraints, we could abstract stable APIs that allow
users to define custom THP policies tailored to their needs.

>
> Whatever new mode we add, it should honor PR_SET_THP_DISABLE +
> MADV_NOHUGEPAGE.

Yes, the BPF only selects different THP modes for different tasks,
nothing else won't be changed.

>
> So, if we want another way to enable things, it would live between
> "never" and "madvise".

Yes, BPF only selects the appropriate THP mode for each task—nothing
else is modified.

>
> I'm wondering how we could make that generic: likely we want this new
> mechanism to *not* be triggerable by the process itself (madvise).
>
> I am not convinced bpf is the answer here ...

I believe the key insight is that we should define a generic, stable
API for BPF-based THP mode selection.


--
Regards
Yafang





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux