On 5/28/2025 1:48 AM, Shakeel Butt wrote:
On Sun, May 25, 2025 at 08:35:24PM +0800, Chen, Yu C wrote:
On 5/25/2025 1:32 AM, Shakeel Butt wrote:
[...]
can you please give an end-to-end> flow/story of all these events
happening on a timeline.
Yes, sure, let me have a try.
The goal of NUMA balancing is to co-locate a task and its
memory pages on the same NUMA node. There are two strategies:
migrate the pages to the task's node, or migrate the task to
the node where its pages reside.
Suppose a task p1 is running on Node 0, but its pages are
located on Node 1. NUMA page fault statistics for p1 reveal
its "page footprint" across nodes. If NUMA balancing detects
that most of p1's pages are on Node 1:
1.Page Migration Attempt:
The Numa balance first tries to migrate p1's pages to Node 0.
The numa_page_migrate counter increments.
2.Task Migration Strategies:
After the page migration finishes, Numa balance checks every
1 second to see if p1 can be migrated to Node 1.
Case 2.1: Idle CPU Available
If Node 1 has an idle CPU, p1 is directly scheduled there. This event is
logged as numa_task_migrated.
Case 2.2: No Idle CPU (Task Swap)
If all CPUs on Node1 are busy, direct migration could cause CPU contention
or load imbalance. Instead:
The Numa balance selects a candidate task p2 on Node 1 that prefers
Node 0 (e.g., due to its own page footprint).
p1 and p2 are swapped. This cross-node swap is recorded as
numa_task_swapped.
Thanks for the explanation, this is really helpful and I would like this
to be included in the commit message.
OK, just sent out a v6 with the commit message enhanced.
Thanks,
Chenyu