Re: [RFC PATCH 1/2] rcu: Add rcu_read_lock_notrace()

"Paul E. McKenney" <paulmck@xxxxxxxxxx> · Thu, 17 Jul 2025 08:18:24 -0700

On Thu, Jul 17, 2025 at 10:46:46AM -0400, Mathieu Desnoyers wrote:
> On 2025-07-17 09:14, Mathieu Desnoyers wrote:
> > On 2025-07-16 18:54, Paul E. McKenney wrote:
> [...]
> > 
> > 2) I think I'm late to the party in reviewing srcu-fast, I'll
> >     go have a look :)
> 
> OK, I'll bite. :) Please let me know where I'm missing something:
> 
> Looking at srcu-lite and srcu-fast, I understand that they fundamentally
> depend on a trick we published here https://lwn.net/Articles/573497/
> "The RCU-barrier menagerie" that allows turning, e.g. this Dekker:
> 
> volatile int x = 0, y = 0
> 
> CPU 0              CPU 1
> 
> x = 1              y = 1
> smp_mb             smp_mb
> r2 = y             r4 = x
> 
> BUG_ON(r2 == 0 && r4 == 0)
> 
> into
> 
> volatile int x = 0, y = 0
> 
> CPU 0            CPU 1
> 
> rcu_read_lock()
> x = 1              y = 1
>                    synchronize_rcu()
> r2 = y             r4 = x
> rcu_read_unlock()
> 
> BUG_ON(r2 == 0 && r4 == 0)
> 
> So looking at srcu-fast, we have:
> 
>  * Note that both this_cpu_inc() and atomic_long_inc() are RCU read-side
>  * critical sections either because they disables interrupts, because they
>  * are a single instruction, or because they are a read-modify-write atomic
>  * operation, depending on the whims of the architecture.
> 
> It appears to be pairing, as RCU read-side:
> 
> - irq off/on implied by this_cpu_inc
> - atomic
> - single instruction
> 
> with synchronize_rcu within the grace period, and hope that this behaves as a
> smp_mb pairing preventing the srcu read-side critical section from leaking
> out of the srcu read lock/unlock.
> 
> I note that there is a validation that rcu_is_watching() within
> __srcu_read_lock_fast, but it's one thing to have rcu watching, but
> another to have an actual read-side critical section. Note that
> preemption, irqs, softirqs can very well be enabled when calling
> __srcu_read_lock_fast.
> 
> My understanding of the how memory barriers implemented with RCU
> work is that we need to surround the memory accesses on the fast-path
> (where we turn smp_mb into barrier) with an RCU read-side critical
> section to make sure it does not spawn across a synchronize_rcu.
> 
> What I am missing here is how can a RCU side-side that only consist
> of the irq off/on or atomic or single instruction cover all memory
> accesses we are trying to order, namely those within the srcu
> critical section after the compiler barrier() ? Is having RCU
> watching sufficient to guarantee this ?

Good eyes!!!

The trick is that this "RCU read-side critical section" consists only of
either this_cpu_inc() or atomic_long_inc(), with the latter only happening
in systems that have NMIs, but don't have NMI-safe per-CPU operations.
Neither this_cpu_inc() nor atomic_long_inc() can be interrupted, and
thus both act as an interrupts-disabled RCU read-side critical section.

Therefore, if the SRCU grace-period computation fails to see an
srcu_read_lock_fast() increment, its earlier code is guaranteed to
happen before the corresponding critical section.  Similarly, if the SRCU
grace-period computation sees an srcu_read_unlock_fast(), its subsequent
code is guaranteed to happen after the corresponding critical section.

Does that help?  If so, would you be interested and nominating a comment?

Or am I missing something subtle here?

Either way, many thanks for digging into this!!!

							Thanx, Paul

> Thanks,
> 
> Mathieu
> 
> -- 
> Mathieu Desnoyers
> EfficiOS Inc.
> https://www.efficios.com