On 2025/6/4 01:29, Shakeel Butt wrote:
On Tue, Jun 03, 2025 at 04:48:08PM +0200, Michal Hocko wrote:
On Tue 03-06-25 22:22:46, Baolin Wang wrote:
Let me try to clarify further.
The 'mm->rss_stat' is updated by using add_mm_counter(),
dec/inc_mm_counter(), which are all wrappers around
percpu_counter_add_batch(). In percpu_counter_add_batch(), there is percpu
batch caching to avoid 'fbc->lock' contention.
OK, this is exactly the line of argument I was looking for. If _all_
updates done in the kernel are using batching and therefore the lock is
only held every N (percpu_counter_batch) updates then a risk of locking
contention would be decreased. This is worth having a note in the
changelog.
OK.
This patch changes task_mem()
and task_statm() to get the accurate mm counters under the 'fbc->lock', but
this will not exacerbate kernel 'mm->rss_stat' lock contention due to the
the percpu batch caching of the mm counters.
You might argue that my test cases cannot demonstrate an actual lock
contention, but they have already shown that there is no significant
'fbc->lock' contention when the kernel updates 'mm->rss_stat'.
I was arguing that `top -d 1' doesn't really represent a potential
adverse usage. These proc files are generally readable so I would be
expecting something like busy loop read while process tries to update
counters to see the worst case scenario. If that is barely visible then
we can conclude a normal use wouldn't even notice.
OK.
Baolin, please run stress-ng command that stresses minor anon page
faults in multiple threads and then run multiple bash scripts which cat
/proc/pidof(stress-ng)/status. That should be how much the stress-ng
process is impacted by the parallel status readers versus without them.
Sure. Thanks Shakeel. I run the stress-ng with the 'stress-ng --fault 32
--perf -t 1m' command, while simultaneously running the following
scripts to read the /proc/pidof(stress-ng)/status for each thread.
From the following data, I did not observe any obvious impact of this
patch on the stress-ng tests when repeatedly reading the
/proc/pidof(stress-ng)/status.
w/o patch
stress-ng: info: [6891] 3,993,235,331,584 CPU Cycles
59.767 B/sec
stress-ng: info: [6891] 1,472,101,565,760 Instructions
22.033 B/sec (0.369 instr. per cycle)
stress-ng: info: [6891] 36,287,456 Page Faults Total
0.543 M/sec
stress-ng: info: [6891] 36,287,456 Page Faults Minor
0.543 M/sec
w/ patch
stress-ng: info: [6872] 4,018,592,975,968 CPU Cycles
60.177 B/sec
stress-ng: info: [6872] 1,484,856,150,976 Instructions
22.235 B/sec (0.369 instr. per cycle)
stress-ng: info: [6872] 36,547,456 Page Faults Total
0.547 M/sec
stress-ng: info: [6872] 36,547,456 Page Faults Minor
0.547 M/sec
=========================
#!/bin/bash
# Get the PIDs of stress-ng processes
PIDS=$(pgrep stress-ng)
# Loop through each PID and monitor /proc/[pid]/status
for PID in $PIDS; do
while true; do
cat /proc/$PID/status
usleep 100000
done &
done