On Fri, May 23, 2025 at 08:51:01PM +0800, Chen Yu wrote: > From: Libo Chen <libo.chen@xxxxxxxxxx> > > Task swapping is triggered when there are no idle CPUs in > task A's preferred node. In this case, the NUMA load balancer > chooses a task B on A's preferred node and swaps B with A. This > helps improve NUMA locality without introducing load imbalance > between nodes. In the current implementation, B's NUMA node > preference is not mandatory. That is to say, a kernel thread > might be incorrectly chosen as B. However, kernel thread and > user space thread that does not have mm are not supposed to be > covered by NUMA balancing because NUMA balancing only considers > user pages via VMAs. > > According to Peter's suggestion for fixing this issue, we use > PF_KTHREAD to skip the kernel thread. curr->mm is also checked > because it is possible that user_mode_thread() might create a > user thread without an mm. As per Prateek's analysis, after > adding the PF_KTHREAD check, there is no need to further check > the PF_IDLE flag: > " > - play_idle_precise() already ensures PF_KTHREAD is set before adding > PF_IDLE > > - cpu_startup_entry() is only called from the startup thread which > should be marked with PF_KTHREAD (based on my understanding looking at > commit cff9b2332ab7 ("kernel/sched: Modify initial boot task idle > setup")) > " > > In summary, the check in task_numa_compare() now aligns with > task_tick_numa(). > > Suggested-by: Michal Koutny <mkoutny@xxxxxxxx> > Tested-by: Ayush Jain <Ayush.jain3@xxxxxxx> > Signed-off-by: Libo Chen <libo.chen@xxxxxxxxxx> > Tested-by: Venkat Rao Bagalkote <venkat88@xxxxxxxxxxxxx> > Signed-off-by: Chen Yu <yu.c.chen@xxxxxxxxx> Reviewed-by: Shakeel Butt <shakeel.butt@xxxxxxxxx>