On Mon 09-06-25 17:45:05, Shakeel Butt wrote: > On Mon, Jun 09, 2025 at 05:17:58PM -0700, Andrew Morton wrote: > > On Mon, 9 Jun 2025 10:56:46 +0200 Vlastimil Babka <vbabka@xxxxxxx> wrote: > > > > > On 6/9/25 10:52 AM, Vlastimil Babka wrote: > > > > On 6/9/25 10:31 AM, Ritesh Harjani (IBM) wrote: > > > >> Baolin Wang <baolin.wang@xxxxxxxxxxxxxxxxx> writes: > > > >> > > > >>> On 2025/6/9 15:35, Michal Hocko wrote: > > > >>>> On Mon 09-06-25 10:57:41, Ritesh Harjani wrote: > > > >>>>> > > > >>>>> Any reason why we dropped the Fixes tag? I see there were a series of > > > >>>>> discussion on v1 and it got concluded that the fix was correct, then why > > > >>>>> drop the fixes tag? > > > >>>> > > > >>>> This seems more like an improvement than a bug fix. > > > >>> > > > >>> Yes. I don't have a strong opinion on this, but we (Alibaba) will > > > >>> backport it manually, > > > >>> > > > >>> because some of user-space monitoring tools depend > > > >>> on these statistics. > > > >> > > > >> That sounds like a regression then, isn't it? > > > > > > > > Hm if counters were accurate before f1a7941243c1 and not afterwards, and > > > > this is making them accurate again, and some userspace depends on it, > > > > then Fixes: and stable is probably warranted then. If this was just a > > > > perf improvement, then not. But AFAIU f1a7941243c1 was the perf > > > > improvement... > > > > > > Dang, should have re-read the commit log of f1a7941243c1 first. It seems > > > like the error margin due to batching existed also before f1a7941243c1. > > > > > > " This patch converts the rss_stats into percpu_counter to convert the > > > error margin from (nr_threads * 64) to approximately (nr_cpus ^ 2)." > > > > > > so if on some systems this means worse margin than before, the above > > > "if" chain of thought might still hold. > > > > f1a7941243c1 seems like a good enough place to tell -stable > > maintainers where to insert the patch (why does this sound rude). > > > > The patch is simple enough. I'll add fixes:f1a7941243c1 and cc:stable > > and, as the problem has been there for years, I'll leave the patch in > > mm-unstable so it will eventually get into LTS, in a well tested state. > > One thing f1a7941243c1 noted was that the percpu counter conversion > enabled us to get more accurate stats with some cpu cost and in this > patch Baolin has shown that the cpu cost of accurate stats is > reasonable, so seems safe for stable backport. Also it seems like > multiple users are impacted by this issue, so I am fine with stable > backport. Fair point. -- Michal Hocko SUSE Labs