On Tue, Jun 10, 2025 at 11:26:35AM -1000, Tejun Heo wrote: > Hello, > > On Mon, Jun 09, 2025 at 03:56:10PM -0700, Shakeel Butt wrote: > ... > > + self = &rstatc->lnode; > > + if (!try_cmpxchg(&(rstatc->lnode.next), &self, NULL)) > > return; > > > > + llist_add(&rstatc->lnode, lhead); > > I may be missing something but when you say multiple inserters, you mean the > function being re-entered from stacked contexts - ie. process context, BH, > irq, nmi? Yes. > If so, would it make sense to make the nmi and non-nmi paths use > separate lnode? In non-nmi path, we can just disable irq and test whether > lnode is empty and add it. nmi path can just test whether its lnode is empty > and add it. I suppose nmi's don't nest, right? If they do, we can do > try_cmpxchg() there I suppose. > > While the actual addition to the list would be relatively low frequency, > css_rstat_updated() itself can be called pretty frequently. Before, the hot > path was early exit after data_race(css_rstat_cpu(css, cpu)->updated_next). > After, the hot path is now !try_cmpxchg() which doesn't seem great. > Couple of lines above I have llist_on_list(&rstatc->lnode) check which should be as cheap as data_race(css_rstat_cpu(css, cpu)->updated_next). So, I can add lnode for nmi and non-nmi contexts (with irqs disabled) but I think that is not needed. Actually I ran the netperf benchmark (36 parallel instances) and I see no significant differences with and without the patch. Thanks for taking a look. Shakeel