On Sat, May 24, 2025 at 2:07 AM Chen, Yu C <yu.c.chen@xxxxxxxxx> wrote: > > Hi Shakeel, > > On 5/24/2025 7:42 AM, Shakeel Butt wrote: > > On Fri, May 23, 2025 at 08:51:15PM +0800, Chen Yu wrote: > >> On systems with NUMA balancing enabled, it has been found > >> that tracking task activities resulting from NUMA balancing > >> is beneficial. NUMA balancing employs two mechanisms for task > >> migration: one is to migrate a task to an idle CPU within its > >> preferred node, and the other is to swap tasks located on > >> different nodes when they are on each other's preferred nodes. > >> > >> The kernel already provides NUMA page migration statistics in > >> /sys/fs/cgroup/mytest/memory.stat and /proc/{PID}/sched. However, > >> it lacks statistics regarding task migration and swapping. > >> Therefore, relevant counts for task migration and swapping should > >> be added. > >> > >> The following two new fields: > >> > >> numa_task_migrated > >> numa_task_swapped > >> > >> will be shown in /sys/fs/cgroup/{GROUP}/memory.stat, /proc/{PID}/sched > >> and /proc/vmstat > > > > Hmm these are scheduler events, how are these relevant to memory cgroup > > or vmstat? > > Any reason to not expose these in cpu.stat? > > > > I understand that in theory they are scheduling activities. > The reason for including these statistics here was mainly that > I assumed there is a close relationship between page migration > and task migration in Numa Balance. Specifically, task migration > is triggered when page migration fails. > Placing these statistics closer to the existing Numa Balance page > statistics in /sys/fs/cgroup/{GROUP}/memory.stat and /proc/vmstat > may help users query relevant data from a single file, avoiding > the need to search through scattered files. > Notably, these events are associated with a task’s working set > (footprint) rather than pure CPU cycles IMO. I took a look at > the cpu_cfs_stat_show() for cpu.stat, it seems that a lot of > code is needed if we want to expose them in cpu.stat, while > reusing existing interface of count_memcg_event_mm() is simpler. Let me address two of your points first: (1) cpu.stat currently contains cpu cycles stats. I don't see an issue adding these new events in it as you can see memory.stat exposes stats and events as well. (2) You can still use count_memcg_event_mm() and related infra while exposing the stats/events in cpu.stat. Now your point on having related stats within a single interface is more convincing. Let me ask you couple of simple questions: I am not well versed with numa migration, can you expand a bit more on these two events (numa_task_migrated & numa_task_swapped)? How are these related to numa memory migration? You mentioned these events happen on page migration failure, can you please give an end-to-end flow/story of all these events happening on a timeline. Beside that, do you think there might be some other scheduling events (maybe unrelated to numa balancing) which might be suitable for memory.stat? Basically I am trying to find if having sched events in memory.stat be an exception for numa balancing or more general. thanks, Shakeel