Re: [DISCUSSION] proposed mctl() API

Matthew Wilcox <willy@xxxxxxxxxxxxx> · Tue, 10 Jun 2025 17:26:31 +0100

On Tue, Jun 10, 2025 at 05:00:47PM +0100, Usama Arif wrote:
> On 10/06/2025 16:46, Matthew Wilcox wrote:
> > On Tue, Jun 10, 2025 at 04:30:43PM +0100, Usama Arif wrote:
> >> If we have 2 workloads on the same server, For e.g. one is database where THPs 
> >> just dont do well, but the other one is AI where THPs do really well. How
> >> will the kernel monitor that the database workload is performing worse
> >> and the AI one isnt?
> > 
> > It can monitor the allocation/access patterns and see who's getting
> > the benefit.  The two workloads are in competition for memory, and
> > we can tell which pages are hot and which cold.
> > 
> > And I don't believe it's a binary anyway.  I bet there are some
> > allocations where the database benefits from having THPs (I mean, I know
> > a database which invented the entire hugetlbfs subsystem so it could
> > use PMD entries and avoid one layer of TLB misses!)
> > 
> 
> Sure, but this is just an example. Workload owners are not going to spend time
> trying to see how each allocation works and if its hot, they put it in hugetlbfs.

No, they're not.  It should be automatic.  There are many deficiencies
in the kernel; this is one of them.

> Ofcourse hugetlbfs has its own drawbacks of reserving pages.

Drawback or advantage?  It's a feature.  You're being very strange about
this.  First you want to reserve THPs for some workloads only, then when
given a way to do that you complain that ... you have to reserve hugetlb
pages.  You can't possibly mean both of these things sincerely.