On Wed, Apr 30, 2025 at 06:14:28AM -0700, Yosry Ahmed wrote: [...] > > + > > + if (!_css_rstat_cpu_trylock(css, cpu, &flags)) { > > > IIUC this trylock will only fail if a BPF program runs in NMI context > and tries to update cgroup stats, interrupting a context that is already > holding the lock (i.e. updating or flushing stats). > Correct (though note that flushing side can be on a different CPU). > How often does this happen in practice tho? Is it worth the complexity? This is about correctness, so even a chance of occurance need the solution. > > I wonder if it's better if we make css_rstat_updated() inherently > lockless instead. > > What if css_rstat_updated() always just adds to a lockless tree, Here I assume you meant lockless list instead of tree. > and we > defer constructing the proper tree to the flushing side? This should > make updates generally faster and avoids locking or disabling interrupts > in the fast path. We essentially push more work to the flushing side. > > We may be able to consolidate some of the code too if all the logic > manipulating the tree is on the flushing side. > > WDYT? Am I missing something here? > Yes this can be done but I don't think we need to tie that to current series. I think we can start with lockless in the nmi context and then iteratively make css_rstat_updated() lockless for all contexts.