On 5/13/25 05:13, Shakeel Butt wrote: > The function memcg_rstat_updated() is used to track the memcg stats > updates for optimizing the flushes. At the moment, it is not re-entrant > safe and the callers disabled irqs before calling. However to achieve > the goal of updating memcg stats without irqs, memcg_rstat_updated() > needs to be re-entrant safe against irqs. > > This patch makes memcg_rstat_updated() re-entrant safe against irqs. > However it is using atomic_* ops which on x86, adds lock prefix to the > instructions. Since this is per-cpu data, the this_cpu_* ops are > preferred. However the percpu pointer is stored in struct mem_cgroup and > doing the upward traversal through struct mem_cgroup may cause two cache > misses as compared to traversing through struct memcg_vmstats_percpu > pointer. > > NOTE: explore if there is atomic_* ops alternative without lock prefix. local_t might be what you want here https://docs.kernel.org/core-api/local_ops.html Or maybe just add __percpu to parent like this? struct memcg_vmstats_percpu { ... struct memcg_vmstats_percpu __percpu *parent; ... } Yes, it means on each cpu's struct memcg_vmstats_percpu instance there will be actually the same value stored (the percpu offset) instead of the cpu-specific parent pointer, which might seem wasteful. But AFAIK this_cpu_* is optimized enough thanks to the segment register usage, that it doesn't matter? It shouldn't cause any extra cache miss you worry about, IIUC? With that I think you could refactor that code to use e.g. this_cpu_add_return() and this_cpu_xchg() on the stats_updates and obtain the parent "pointer" in a way that's also compatible with these operations. That is unless we want also nmi safety, then we're back to the issue of the previous series... > Signed-off-by: Shakeel Butt <shakeel.butt@xxxxxxxxx> > --- > mm/memcontrol.c | 21 +++++++++++++-------- > 1 file changed, 13 insertions(+), 8 deletions(-) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 6cfa3550f300..2c4c095bf26c 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -503,7 +503,7 @@ static inline int memcg_events_index(enum vm_event_item idx) > > struct memcg_vmstats_percpu { > /* Stats updates since the last flush */ > - unsigned int stats_updates; > + atomic_t stats_updates; > > /* Cached pointers for fast iteration in memcg_rstat_updated() */ > struct memcg_vmstats_percpu *parent; > @@ -590,12 +590,15 @@ static bool memcg_vmstats_needs_flush(struct memcg_vmstats *vmstats) > static inline void memcg_rstat_updated(struct mem_cgroup *memcg, int val) > { > struct memcg_vmstats_percpu *statc; > - int cpu = smp_processor_id(); > - unsigned int stats_updates; > + int cpu; > + int stats_updates; > > if (!val) > return; > > + /* Don't assume callers have preemption disabled. */ > + cpu = get_cpu(); > + > cgroup_rstat_updated(memcg->css.cgroup, cpu); > statc = this_cpu_ptr(memcg->vmstats_percpu); > for (; statc; statc = statc->parent) { > @@ -607,14 +610,16 @@ static inline void memcg_rstat_updated(struct mem_cgroup *memcg, int val) > if (memcg_vmstats_needs_flush(statc->vmstats)) > break; > > - stats_updates = READ_ONCE(statc->stats_updates) + abs(val); > - WRITE_ONCE(statc->stats_updates, stats_updates); > + stats_updates = atomic_add_return(abs(val), &statc->stats_updates); > if (stats_updates < MEMCG_CHARGE_BATCH) > continue; > > - atomic64_add(stats_updates, &statc->vmstats->stats_updates); > - WRITE_ONCE(statc->stats_updates, 0); > + stats_updates = atomic_xchg(&statc->stats_updates, 0); > + if (stats_updates) > + atomic64_add(stats_updates, > + &statc->vmstats->stats_updates); > } > + put_cpu(); > } > > static void __mem_cgroup_flush_stats(struct mem_cgroup *memcg, bool force) > @@ -4155,7 +4160,7 @@ static void mem_cgroup_css_rstat_flush(struct cgroup_subsys_state *css, int cpu) > mem_cgroup_stat_aggregate(&ac); > > } > - WRITE_ONCE(statc->stats_updates, 0); > + atomic_set(&statc->stats_updates, 0); > /* We are in a per-cpu loop here, only do the atomic write once */ > if (atomic64_read(&memcg->vmstats->stats_updates)) > atomic64_set(&memcg->vmstats->stats_updates, 0);