Re: [PATCH -rcu -next 6/7] rcu: Document separation of rcu_state and rnp's gp_seq

"Paul E. McKenney" <paulmck@xxxxxxxxxx> · Thu, 10 Jul 2025 17:02:50 -0700

On Tue, Jul 08, 2025 at 10:22:23AM -0400, Joel Fernandes wrote:
> The details of this are subtle and was discussed recently. Add a
> quick-quiz about this and refer to it from the code, for more clarity.
> 
> Signed-off-by: Joel Fernandes <joelagnelf@xxxxxxxxxx>

Reviewed-by: Paul E. McKenney <paulmck@xxxxxxxxxx>

> ---
>  .../Data-Structures/Data-Structures.rst       | 32 +++++++++++++++++++
>  kernel/rcu/tree.c                             |  4 +++
>  2 files changed, 36 insertions(+)
> 
> diff --git a/Documentation/RCU/Design/Data-Structures/Data-Structures.rst b/Documentation/RCU/Design/Data-Structures/Data-Structures.rst
> index 04e16775c752..930535f076b4 100644
> --- a/Documentation/RCU/Design/Data-Structures/Data-Structures.rst
> +++ b/Documentation/RCU/Design/Data-Structures/Data-Structures.rst
> @@ -286,6 +286,38 @@ in order to detect the beginnings and ends of grace periods in a
>  distributed fashion. The values flow from ``rcu_state`` to ``rcu_node``
>  (down the tree from the root to the leaves) to ``rcu_data``.
>  
> ++-----------------------------------------------------------------------+
> +| **Quick Quiz**:                                                       |
> ++-----------------------------------------------------------------------+
> +| Given that the root rcu_node structure has a gp_seq field,            |
> +| why does RCU maintain a separate gp_seq in the rcu_state structure?   |
> +| Why not just use the root rcu_node's gp_seq as the official record    |
> +| and update it directly when starting a new grace period?              |
> ++-----------------------------------------------------------------------+
> +| **Answer**:                                                           |
> ++-----------------------------------------------------------------------+
> +| On single-node RCU trees (where the root node is also a leaf),        |
> +| updating the root node's gp_seq immediately would create unnecessary  |
> +| lock contention. Here's why:                                          |
> +|                                                                       |
> +| If we did rcu_seq_start() directly on the root node's gp_seq:         |
> +| 1. All CPUs would immediately see their node's gp_seq from their rdp's|
> +|    gp_seq, in rcu_pending(). They would all then invoke the RCU-core. |
> +| 2. Which calls note_gp_changes() and try to acquire the node lock.    |
> +| 3. But rnp->qsmask isn't initialized yet (happens later in            |
> +|    rcu_gp_init())                                                     |
> +| 4. So each CPU would acquire the lock, find it can't determine if it  |
> +|    needs to report quiescent state (no qsmask), update rdp->gp_seq,   |
> +|    and release the lock.                                              |
> +| 5. Result: Lots of lock acquisitions with no grace period progress    |
> +|                                                                       |
> +| By having a separate rcu_state.gp_seq, we can increment the official  |
> +| grace period counter without immediately affecting what CPUs see in   |
> +| their nodes. The hierarchical propagation in rcu_gp_init() then       |
> +| updates the root node's gp_seq and qsmask together under the same lock|
> +| acquisition, avoiding this useless contention.                        |
> ++-----------------------------------------------------------------------+
> +
>  Miscellaneous
>  '''''''''''''
>  
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 32fdb66e9c9f..c31b85e62310 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -1842,6 +1842,10 @@ static noinline_for_stack bool rcu_gp_init(void)
>  	 * use-after-free errors. For a detailed explanation of this race, see
>  	 * Documentation/RCU/Design/Requirements/Requirements.rst in the
>  	 * "Hotplug CPU" section.
> +	 *
> +	 * Also note that the root rnp's gp_seq is kept separate from, and lags,
> +	 * the rcu_state's gp_seq, for a reason. See the Quick-Quiz on
> +	 * Single-node systems for more details (in Data-Structures.rst).
>  	 */
>  	rcu_seq_start(&rcu_state.gp_seq);
>  	/* Ensure that rcu_seq_done_exact() guardband doesn't give false positives. */
> -- 
> 2.34.1
>