On Tue, Jul 08, 2025 at 10:22:23AM -0400, Joel Fernandes wrote: > The details of this are subtle and was discussed recently. Add a > quick-quiz about this and refer to it from the code, for more clarity. > > Signed-off-by: Joel Fernandes <joelagnelf@xxxxxxxxxx> Reviewed-by: Paul E. McKenney <paulmck@xxxxxxxxxx> > --- > .../Data-Structures/Data-Structures.rst | 32 +++++++++++++++++++ > kernel/rcu/tree.c | 4 +++ > 2 files changed, 36 insertions(+) > > diff --git a/Documentation/RCU/Design/Data-Structures/Data-Structures.rst b/Documentation/RCU/Design/Data-Structures/Data-Structures.rst > index 04e16775c752..930535f076b4 100644 > --- a/Documentation/RCU/Design/Data-Structures/Data-Structures.rst > +++ b/Documentation/RCU/Design/Data-Structures/Data-Structures.rst > @@ -286,6 +286,38 @@ in order to detect the beginnings and ends of grace periods in a > distributed fashion. The values flow from ``rcu_state`` to ``rcu_node`` > (down the tree from the root to the leaves) to ``rcu_data``. > > ++-----------------------------------------------------------------------+ > +| **Quick Quiz**: | > ++-----------------------------------------------------------------------+ > +| Given that the root rcu_node structure has a gp_seq field, | > +| why does RCU maintain a separate gp_seq in the rcu_state structure? | > +| Why not just use the root rcu_node's gp_seq as the official record | > +| and update it directly when starting a new grace period? | > ++-----------------------------------------------------------------------+ > +| **Answer**: | > ++-----------------------------------------------------------------------+ > +| On single-node RCU trees (where the root node is also a leaf), | > +| updating the root node's gp_seq immediately would create unnecessary | > +| lock contention. Here's why: | > +| | > +| If we did rcu_seq_start() directly on the root node's gp_seq: | > +| 1. All CPUs would immediately see their node's gp_seq from their rdp's| > +| gp_seq, in rcu_pending(). They would all then invoke the RCU-core. | > +| 2. Which calls note_gp_changes() and try to acquire the node lock. | > +| 3. But rnp->qsmask isn't initialized yet (happens later in | > +| rcu_gp_init()) | > +| 4. So each CPU would acquire the lock, find it can't determine if it | > +| needs to report quiescent state (no qsmask), update rdp->gp_seq, | > +| and release the lock. | > +| 5. Result: Lots of lock acquisitions with no grace period progress | > +| | > +| By having a separate rcu_state.gp_seq, we can increment the official | > +| grace period counter without immediately affecting what CPUs see in | > +| their nodes. The hierarchical propagation in rcu_gp_init() then | > +| updates the root node's gp_seq and qsmask together under the same lock| > +| acquisition, avoiding this useless contention. | > ++-----------------------------------------------------------------------+ > + > Miscellaneous > ''''''''''''' > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > index 32fdb66e9c9f..c31b85e62310 100644 > --- a/kernel/rcu/tree.c > +++ b/kernel/rcu/tree.c > @@ -1842,6 +1842,10 @@ static noinline_for_stack bool rcu_gp_init(void) > * use-after-free errors. For a detailed explanation of this race, see > * Documentation/RCU/Design/Requirements/Requirements.rst in the > * "Hotplug CPU" section. > + * > + * Also note that the root rnp's gp_seq is kept separate from, and lags, > + * the rcu_state's gp_seq, for a reason. See the Quick-Quiz on > + * Single-node systems for more details (in Data-Structures.rst). > */ > rcu_seq_start(&rcu_state.gp_seq); > /* Ensure that rcu_seq_done_exact() guardband doesn't give false positives. */ > -- > 2.34.1 >