CCing Tejun - with the right mutt alias this time. On Wed, Jun 04, 2025 at 08:19:28AM -0400, Johannes Weiner wrote: > On Fri, May 30, 2025 at 12:31:35PM +0200, Vlastimil Babka wrote: > > On 5/29/25 23:14, Johannes Weiner wrote: > > > On Thu, May 29, 2025 at 04:28:46PM +0100, Matthew Wilcox wrote: > > >> Barry's problem is that we're all nervous about possibly regressing > > >> performance on some unknown workloads. Just try Barry's proposal, see > > >> if anyone actually compains or if we're just afraid of our own shadows. > > > > > > I actually explained why I think this is a terrible idea. But okay, I > > > tried the patch anyway. > > > > > > This is 'git log' on a hot kernel repo after a large IO stream: > > > > > > VANILLA BARRY > > > Real time 49.93 ( +0.00%) 60.36 ( +20.48%) > > > User time 32.10 ( +0.00%) 32.09 ( -0.04%) > > > System time 14.41 ( +0.00%) 14.64 ( +1.50%) > > > pgmajfault 9227.00 ( +0.00%) 18390.00 ( +99.30%) > > > workingset_refault_file 184.00 ( +0.00%) 236899.00 (+127954.05%) > > > > > > Clearly we can't generally ignore page cache hits just because the > > > mmaps() are intermittent. > > > > > > The whole point is to cache across processes and their various > > > apertures into a common, long-lived filesystem space. > > > > > > Barry knows something about the relationship between certain processes > > > and certain files that he could exploit with MADV_COLD-on-exit > > > semantics. But that's not something the kernel can safely assume. Not > > > without defeating the page cache for an entire class of file accesses. > > > > I've just read the previous threads about Barry's proposal and if doing this > > always isn't feasible, I'm wondering if memcg would be a better interface to > > opt-in for this kind of behavior than both prctl or mctl. I think at least > > conceptually it fits what memcg is doing? The question is if the > > implementation would be feasible, and if android puts apps in separate memcgs... > > CCing Tejun. > > Cgroups has been trying to resist flag settings like these. The cgroup > tree is a nested hierarchical structure designed for dividing up > system resources. But flag properties don't have natural inheritance > rules. What does it mean if the parent group says one thing and the > child says another? Which one has precedence? > > Hence the proposal to make it a per-process property that propagates > through fork() and exec(). This also enables the container usecase (by > setting the flag in the container launching process), without there > being any confusion what the *effective* setting for any given process > in the system is.