Re: [RFC PATCH v2 0/5] mm, bpf: BPF based THP adjustment

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, May 20, 2025 at 5:49 PM Lorenzo Stoakes
<lorenzo.stoakes@xxxxxxxxxx> wrote:
>
> On Tue, May 20, 2025 at 11:43:11AM +0200, David Hildenbrand wrote:
> > > Conclusion
> > > ----------
> > >
> > > Introducing a new "bpf" mode for BPF-based per-task THP adjustments is the
> > > most effective solution for our requirements. This approach represents a
> > > small but meaningful step toward making THP truly usable—and manageable—in
> > > production environments.
> > A new "bpf" mode sounds way too special.
> >
> > We currently have:
> >
> > never -> never
> > madvise -> MADV_HUGEPAGE, except PR_SET_THP_DISABLE
> > always -> always, except PR_SET_THP_DISABLE and MADV_NOHUGEPAGE
> >
> > Whatever new mode we add, it should honor PR_SET_THP_DISABLE +
> > MADV_NOHUGEPAGE.
> >
> > So, if we want another way to enable things, it would live between "never"
> > and "madvise".
> >
> > I'm wondering how we could make that generic: likely we want this new
> > mechanism to *not* be triggerable by the process itself (madvise).
> >
> > I am not convinced bpf is the answer here ...
>
> Agreed.
>
> I am also very concerned with us inserting BPF bits here - are we not then
> ensuring that we cannot in any way move towards a future where we
> 'automagically' determine what to do?
>
> I don't know what is claimed about BPF, but it strikes me that we're
> establishing a permanent uABI (uAPI?) if we do that and essentially
> promising that THP will continue to operate in a fashion similar to how it
> does now.
>
> While BPF is a wonderful technology, I thik we have to be very very careful
> about inserting it in places that consist of -implementation details- that
> we in mm already are planning to move away from.
>
> It's one thing adding BPF in the oomk (simple interface, unlikely to
> change, doesn't really constrain us) or the scheduler (again the hooks are
> by nature reasonably stable), it's quite another sticking it in the heart
> of a part of mm that is undergoing _constant_ change, partly as evidenced
> by the sheer number of series related to THP that are currently on-list.
>
> So while BPF may be the best solution for your needs _right now_, we need
> be concerned with how things affect the kernel in the future.
>
> I think we really do have to tread very carefully here.

I totally agree with you that the key point here is how to define the
API. As I replied to David, I believe we have two fundamental
principles to adjust the THP policies:
1. Selective Benefit: Some tasks benefit from THP, while others do not.
2. Conditional Safety: THP allocation is safe under certain conditions
but not others.

Therefore, I believe we can define these APIs based on the established
principles - everything else constitutes implementation details, even
if core MM internals need to change.

-- 
Regards
Yafang





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux