On Tue, Aug 19, 2025 at 03:50:49AM +0100, Matthew Wilcox wrote: > On Mon, Aug 18, 2025 at 05:36:54PM -0700, Boris Burkov wrote: > > Uncharged pages are tricky to track by their essential "uncharged" > > nature. To maintain good accounting, introduce a vmstat counter tracking > > all uncharged pages. Since this is only meaningful when cgroups are > > configured, only expose the counter when CONFIG_MEMCG is set. > > I don't understand why this is needed. Maybe Shakeel had better > reasoning that wasn't captured in the commit message. > > If they're unaccounted, then you can get a good estimate of them > just by subtracting the number of accounted pages from the number of > file pages. Sure there's a small race between the two numbers being > updated, so you migth be off by a bit. My initial thinking was based on Qu's original proposal which was using root memcg where there will not be any difference between accounted file pages and system wide file pages. However with Boris's change, we can actually get the estimate, as you pointed out, by subtracting the number of accounted file pages from system wide number of file pages. However I still think we should keep this new metric because of performance reason. To get accounted file pages, we need to read memory.stat of the root memcg which can be very expensive. Basically it may have to flush the rstat update trees on all the CPUs on the system. Since this new metric will be used to calculate system overhead, the high cost will limit how frequently a user can query the latest stat. I do know there are use-cases where users want to query the system overhead at high frequency. One such use-case is keeping a safe buffer on a memory overcommitted system (Google does this and measure system overhead every second).