On Thu, Jul 17, 2025 at 10:46:46AM -0400, Mathieu Desnoyers wrote: > On 2025-07-17 09:14, Mathieu Desnoyers wrote: > > On 2025-07-16 18:54, Paul E. McKenney wrote: > [...] > > > > 2) I think I'm late to the party in reviewing srcu-fast, I'll > > go have a look :) > > OK, I'll bite. :) Please let me know where I'm missing something: > > Looking at srcu-lite and srcu-fast, I understand that they fundamentally > depend on a trick we published here https://lwn.net/Articles/573497/ > "The RCU-barrier menagerie" that allows turning, e.g. this Dekker: > > volatile int x = 0, y = 0 > > CPU 0 CPU 1 > > x = 1 y = 1 > smp_mb smp_mb > r2 = y r4 = x > > BUG_ON(r2 == 0 && r4 == 0) > > into > > volatile int x = 0, y = 0 > > CPU 0 CPU 1 > > rcu_read_lock() > x = 1 y = 1 > synchronize_rcu() > r2 = y r4 = x > rcu_read_unlock() > > BUG_ON(r2 == 0 && r4 == 0) > > So looking at srcu-fast, we have: > > * Note that both this_cpu_inc() and atomic_long_inc() are RCU read-side > * critical sections either because they disables interrupts, because they > * are a single instruction, or because they are a read-modify-write atomic > * operation, depending on the whims of the architecture. > > It appears to be pairing, as RCU read-side: > > - irq off/on implied by this_cpu_inc > - atomic > - single instruction > > with synchronize_rcu within the grace period, and hope that this behaves as a > smp_mb pairing preventing the srcu read-side critical section from leaking > out of the srcu read lock/unlock. > > I note that there is a validation that rcu_is_watching() within > __srcu_read_lock_fast, but it's one thing to have rcu watching, but > another to have an actual read-side critical section. Note that > preemption, irqs, softirqs can very well be enabled when calling > __srcu_read_lock_fast. > > My understanding of the how memory barriers implemented with RCU > work is that we need to surround the memory accesses on the fast-path > (where we turn smp_mb into barrier) with an RCU read-side critical > section to make sure it does not spawn across a synchronize_rcu. > > What I am missing here is how can a RCU side-side that only consist > of the irq off/on or atomic or single instruction cover all memory > accesses we are trying to order, namely those within the srcu > critical section after the compiler barrier() ? Is having RCU > watching sufficient to guarantee this ? Good eyes!!! The trick is that this "RCU read-side critical section" consists only of either this_cpu_inc() or atomic_long_inc(), with the latter only happening in systems that have NMIs, but don't have NMI-safe per-CPU operations. Neither this_cpu_inc() nor atomic_long_inc() can be interrupted, and thus both act as an interrupts-disabled RCU read-side critical section. Therefore, if the SRCU grace-period computation fails to see an srcu_read_lock_fast() increment, its earlier code is guaranteed to happen before the corresponding critical section. Similarly, if the SRCU grace-period computation sees an srcu_read_unlock_fast(), its subsequent code is guaranteed to happen after the corresponding critical section. Does that help? If so, would you be interested and nominating a comment? Or am I missing something subtle here? Either way, many thanks for digging into this!!! Thanx, Paul > Thanks, > > Mathieu > > -- > Mathieu Desnoyers > EfficiOS Inc. > https://www.efficios.com