Re: [DISCUSSION] proposed mctl() API

Vlastimil Babka <vbabka@xxxxxxx> · Fri, 30 May 2025 12:31:35 +0200

On 5/29/25 23:14, Johannes Weiner wrote:
> On Thu, May 29, 2025 at 04:28:46PM +0100, Matthew Wilcox wrote:
>> Barry's problem is that we're all nervous about possibly regressing
>> performance on some unknown workloads.  Just try Barry's proposal, see
>> if anyone actually compains or if we're just afraid of our own shadows.
> 
> I actually explained why I think this is a terrible idea. But okay, I
> tried the patch anyway.
> 
> This is 'git log' on a hot kernel repo after a large IO stream:
> 
>                                      VANILLA                      BARRY
> Real time                 49.93 (    +0.00%)         60.36 (   +20.48%)
> User time                 32.10 (    +0.00%)         32.09 (    -0.04%)
> System time               14.41 (    +0.00%)         14.64 (    +1.50%)
> pgmajfault              9227.00 (    +0.00%)      18390.00 (   +99.30%)
> workingset_refault_file  184.00 (    +0.00%)    236899.00 (+127954.05%)
> 
> Clearly we can't generally ignore page cache hits just because the
> mmaps() are intermittent.
> 
> The whole point is to cache across processes and their various
> apertures into a common, long-lived filesystem space.
> 
> Barry knows something about the relationship between certain processes
> and certain files that he could exploit with MADV_COLD-on-exit
> semantics. But that's not something the kernel can safely assume. Not
> without defeating the page cache for an entire class of file accesses.

I've just read the previous threads about Barry's proposal and if doing this
always isn't feasible, I'm wondering if memcg would be a better interface to
opt-in for this kind of behavior than both prctl or mctl. I think at least
conceptually it fits what memcg is doing? The question is if the
implementation would be feasible, and if android puts apps in separate memcgs...