On Fri, Jul 04, 2025 at 11:08:04AM -0700, Shakeel Butt wrote: > Before the commit 36df6e3dbd7e ("cgroup: make css_rstat_updated nmi > safe"), the struct llist_node is expected to be private to the one > inserting the node to the lockless list or the one removing the node > from the lockless list. After the mentioned commit, the llist_node in > the rstat code is per-cpu shared between the stacked contexts i.e. > process, softirq, hardirq & nmi. It is possible the compiler may tear > the loads or stores of llist_node. Let's avoid that. > > KCSAN reported the following race: > > Reported by Kernel Concurrency Sanitizer on: > CPU: 60 UID: 0 PID: 5425 ... 6.16.0-rc3-next-20250626 #1 NONE > Tainted: [E]=UNSIGNED_MODULE > Hardware name: ... > ================================================================== > ================================================================== > BUG: KCSAN: data-race in css_rstat_flush / css_rstat_updated > write to 0xffffe8fffe1c85f0 of 8 bytes by task 1061 on cpu 1: > css_rstat_flush+0x1b8/0xeb0 > __mem_cgroup_flush_stats+0x184/0x190 > flush_memcg_stats_dwork+0x22/0x50 > process_one_work+0x335/0x630 > worker_thread+0x5f1/0x8a0 > kthread+0x197/0x340 > ret_from_fork+0xd3/0x110 > ret_from_fork_asm+0x11/0x20 > read to 0xffffe8fffe1c85f0 of 8 bytes by task 3551 on cpu 15: > css_rstat_updated+0x81/0x180 > mod_memcg_lruvec_state+0x113/0x2d0 > __mod_lruvec_state+0x3d/0x50 > lru_add+0x21e/0x3f0 > folio_batch_move_lru+0x80/0x1b0 > __folio_batch_add_and_move+0xd7/0x160 > folio_add_lru_vma+0x42/0x50 > do_anonymous_page+0x892/0xe90 > __handle_mm_fault+0xfaa/0x1520 > handle_mm_fault+0xdc/0x350 > do_user_addr_fault+0x1dc/0x650 > exc_page_fault+0x5c/0x110 > asm_exc_page_fault+0x22/0x30 > value changed: 0xffffe8fffe18e0d0 -> 0xffffe8fffe1c85f0 > > $ ./scripts/faddr2line vmlinux css_rstat_flush+0x1b8/0xeb0 > css_rstat_flush+0x1b8/0xeb0: > init_llist_node at include/linux/llist.h:86 > (inlined by) llist_del_first_init at include/linux/llist.h:308 > (inlined by) css_process_update_tree at kernel/cgroup/rstat.c:148 > (inlined by) css_rstat_updated_list at kernel/cgroup/rstat.c:258 > (inlined by) css_rstat_flush at kernel/cgroup/rstat.c:389 > > $ ./scripts/faddr2line vmlinux css_rstat_updated+0x81/0x180 > css_rstat_updated+0x81/0x180: > css_rstat_updated at kernel/cgroup/rstat.c:90 (discriminator 1) > > These are expected race and a simple READ_ONCE/WRITE_ONCE resolves these > reports. However let's add comments to explain the race and the need for > memory barriers if stronger guarantees are needed. > > More specifically the rstat updater and the flusher can race and cause a > scenario where the stats updater skips adding the css to the lockless > list but the flusher might not see those updates done by the skipped > updater. This is benign race and the subsequent flusher will flush those > stats and at the moment there aren't any rstat users which are not fine > with this kind of race. However some future user might want more > stricter guarantee, so let's add appropriate comments to ease the job of > future users. > > Signed-off-by: Shakeel Butt <shakeel.butt@xxxxxxxxx> > Reviewed-by: Paul E. McKenney <paulmck@xxxxxxxxxx> > Fixes: 36df6e3dbd7e ("cgroup: make css_rstat_updated nmi safe") Applied to cgroup/for-6.17. Sorry about the delay. I'm on a vacation and ended up a lot more offline than I expected to be. Thanks. -- tejun