Re: [PATCH bpf-next v1 3/4] bpf: use SCC info instead of loop_entry

Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> · Mon, 28 Apr 2025 10:31:20 -0700

On Sat, Apr 26, 2025 at 3:46 AM Eduard Zingerman <eddyz87@xxxxxxxxx> wrote:
>
> Replace loop_entry-based exact states comparison logic.
> Instead, for states within an iterator based loop, mark all registers
> as read and precise. Use control flow graph strongly connected
> components information to detect states that are members of a loop.
> See comments for mark_all_regs_read_and_precise() for a detailed
> explanation.
>
> This change addresses the cases described in [1].
> These cases can be illustrated with the following diagram:
>
>  .-> A --.  Assume the states are visited in the order A, B, C.
>  |   |   |  Assume that state B reaches a state equivalent to state A.
>  |   v   v  At this point, state C is not processed yet, so state A
>  '-- B   C  has not received any read or precision marks from C.
>             As a result, these marks won't be propagated to B.
>
> If B has incomplete marks, it is unsafe to use it in states_equal()
> checks.
>
> See selftests later in a series for examples of unsafe programs that
> are not detected without this change.
>
> [1] https://lore.kernel.org/bpf/3c6ac16b7578406e2ddd9ba889ce955748fe636b.camel@xxxxxxxxx/
>
> Fixes: 2a0992829ea3 ("bpf: correct loop detection for iterators convergence")
> Signed-off-by: Eduard Zingerman <eddyz87@xxxxxxxxx>
> ---
>  include/linux/bpf_verifier.h |  42 ++--
>  kernel/bpf/verifier.c        | 440 ++++++++++++++++++-----------------
>  2 files changed, 249 insertions(+), 233 deletions(-)
>
> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> index cb8e1ae67180..e718ef265a7b 100644
> --- a/include/linux/bpf_verifier.h
> +++ b/include/linux/bpf_verifier.h
> @@ -445,16 +445,6 @@ struct bpf_verifier_state {
>         /* first and last insn idx of this verifier state */
>         u32 first_insn_idx;
>         u32 last_insn_idx;
> -       /* If this state is a part of states loop this field points to some
> -        * parent of this state such that:
> -        * - it is also a member of the same states loop;
> -        * - DFS states traversal starting from initial state visits loop_entry
> -        *   state before this state.
> -        * Used to compute topmost loop entry for state loops.
> -        * State loops might appear because of open coded iterators logic.
> -        * See get_loop_entry() for more information.
> -        */
> -       struct bpf_verifier_state *loop_entry;
>         /* Sub-range of env->insn_hist[] corresponding to this state's
>          * instruction history.
>          * Backtracking is using it to go from last to first.
> @@ -466,11 +456,10 @@ struct bpf_verifier_state {
>         u32 dfs_depth;
>         u32 callback_unroll_depth;
>         u32 may_goto_depth;
> -       /* If this state was ever pointed-to by other state's loop_entry field
> -        * this flag would be set to true. Used to avoid freeing such states
> -        * while they are still in use.
> +       /* If this state is a checkpoint at insn_idx that belongs to an SCC,
> +        * record the SCC epoch at the time of checkpoint creation.
>          */

Please use normal kernel comment style for all new code:
/*
 * multi-
 * line
 * comment
 */

> -       u32 used_as_loop_entry;
> +       u32 scc_epoch;
>  };
>
>  #define bpf_get_spilled_reg(slot, frame, mask)                         \
> @@ -717,6 +706,29 @@ struct bpf_idset {
>         u32 ids[BPF_ID_MAP_SIZE];
>  };
>
> +/* Information tracked for CFG strongly connected components */
> +struct bpf_scc_info {
> +       /* True if states_equal(... RANGE_WITHIN) ever returned
> +        * true for a state with insn_idx within this SCC.
> +        * E.g. for iterator next call.

I feel RANGE_WITHIN is unnecessary information here.
Maybe reword it as:
Set to true when is_state_visited() detected convergence of open coded iterator.
?

> +        * Indicates that read and precision marks are incomplete for
> +        * states with insn_idx in this SCC.
> +        */
> +       u32 state_loops_possible:1;
> +       /* Number of verifier states with .branches > 0 that have
> +        * state->parent->insn_idx within this SCC.
> +        * In other words, the number of states originating from this
> +        * SCC that have not yet been fully explored.
> +        */
> +       u32 branches:31;
> +       /* Number of times this SCC was entered by some verifier state
> +        * and that state was fully explored.
> +        * In other words, number of times .branches became non-zero
> +        * and then zero again.
> +        */
> +       u32 scc_epoch;
> +};
> +
>  /* single container for all structs
>   * one verifier_env per bpf_check() call
>   */
> @@ -809,6 +821,8 @@ struct bpf_verifier_env {
>         u64 prev_log_pos, prev_insn_print_pos;
>         /* buffer used to temporary hold constants as scalar registers */
>         struct bpf_reg_state fake_reg[2];
> +       struct bpf_scc_info *scc_info;
> +       u32 num_sccs;
>         /* buffer used to generate temporary string representations,
>          * e.g., in reg_type_str() to generate reg_type string
>          */
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 67903270b217..ae642459f342 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -1672,7 +1672,7 @@ static void free_verifier_state(struct bpf_verifier_state *state,
>                 kfree(state);
>  }
>
> -/* struct bpf_verifier_state->{parent,loop_entry} refer to states
> +/* struct bpf_verifier_state->parent refers to states
>   * that are in either of env->{expored_states,free_list}.
>   * In both cases the state is contained in struct bpf_verifier_state_list.
>   */
> @@ -1683,36 +1683,18 @@ static struct bpf_verifier_state_list *state_parent_as_list(struct bpf_verifier_
>         return NULL;
>  }
>
> -static struct bpf_verifier_state_list *state_loop_entry_as_list(struct bpf_verifier_state *st)
> -{
> -       if (st->loop_entry)
> -               return container_of(st->loop_entry, struct bpf_verifier_state_list, state);
> -       return NULL;
> -}
> -
>  /* A state can be freed if it is no longer referenced:
>   * - is in the env->free_list;
>   * - has no children states;
> - * - is not used as loop_entry.
> - *
> - * Freeing a state can make it's loop_entry free-able.
>   */
>  static void maybe_free_verifier_state(struct bpf_verifier_env *env,
>                                       struct bpf_verifier_state_list *sl)
>  {
> -       struct bpf_verifier_state_list *loop_entry_sl;
> -
> -       while (sl && sl->in_free_list &&
> -                    sl->state.branches == 0 &&
> -                    sl->state.used_as_loop_entry == 0) {
> -               loop_entry_sl = state_loop_entry_as_list(&sl->state);
> -               if (loop_entry_sl)
> -                       loop_entry_sl->state.used_as_loop_entry--;
> +       if (sl->in_free_list && sl->state.branches == 0) {
>                 list_del(&sl->node);
>                 free_verifier_state(&sl->state, false);
>                 kfree(sl);
>                 env->free_list_size--;
> -               sl = loop_entry_sl;
>         }
>  }
>
> @@ -1753,9 +1735,8 @@ static int copy_verifier_state(struct bpf_verifier_state *dst_state,
>         dst_state->insn_hist_end = src->insn_hist_end;
>         dst_state->dfs_depth = src->dfs_depth;
>         dst_state->callback_unroll_depth = src->callback_unroll_depth;
> -       dst_state->used_as_loop_entry = src->used_as_loop_entry;
>         dst_state->may_goto_depth = src->may_goto_depth;
> -       dst_state->loop_entry = src->loop_entry;
> +       dst_state->scc_epoch = src->scc_epoch;
>         for (i = 0; i <= src->curframe; i++) {
>                 dst = dst_state->frame[i];
>                 if (!dst) {
> @@ -1798,157 +1779,77 @@ static bool same_callsites(struct bpf_verifier_state *a, struct bpf_verifier_sta
>         return true;
>  }
>
> -/* Open coded iterators allow back-edges in the state graph in order to
> - * check unbounded loops that iterators.
> - *
> - * In is_state_visited() it is necessary to know if explored states are
> - * part of some loops in order to decide whether non-exact states
> - * comparison could be used:
> - * - non-exact states comparison establishes sub-state relation and uses
> - *   read and precision marks to do so, these marks are propagated from
> - *   children states and thus are not guaranteed to be final in a loop;
> - * - exact states comparison just checks if current and explored states
> - *   are identical (and thus form a back-edge).
> - *
> - * Paper "A New Algorithm for Identifying Loops in Decompilation"
> - * by Tao Wei, Jian Mao, Wei Zou and Yu Chen [1] presents a convenient
> - * algorithm for loop structure detection and gives an overview of
> - * relevant terminology. It also has helpful illustrations.
> - *
> - * [1] https://api.semanticscholar.org/CorpusID:15784067
> - *
> - * We use a similar algorithm but because loop nested structure is
> - * irrelevant for verifier ours is significantly simpler and resembles
> - * strongly connected components algorithm from Sedgewick's textbook.
> - *
> - * Define topmost loop entry as a first node of the loop traversed in a
> - * depth first search starting from initial state. The goal of the loop
> - * tracking algorithm is to associate topmost loop entries with states
> - * derived from these entries.
> - *
> - * For each step in the DFS states traversal algorithm needs to identify
> - * the following situations:
> - *
> - *          initial                     initial                   initial
> - *            |                           |                         |
> - *            V                           V                         V
> - *           ...                         ...           .---------> hdr
> - *            |                           |            |            |
> - *            V                           V            |            V
> - *           cur                     .-> succ          |    .------...
> - *            |                      |    |            |    |       |
> - *            V                      |    V            |    V       V
> - *           succ                    '-- cur           |   ...     ...
> - *                                                     |    |       |
> - *                                                     |    V       V
> - *                                                     |   succ <- cur
> - *                                                     |    |
> - *                                                     |    V
> - *                                                     |   ...
> - *                                                     |    |
> - *                                                     '----'
> - *
> - *  (A) successor state of cur   (B) successor state of cur or it's entry
> - *      not yet traversed            are in current DFS path, thus cur and succ
> - *                                   are members of the same outermost loop
> - *
> - *                      initial                  initial
> - *                        |                        |
> - *                        V                        V
> - *                       ...                      ...
> - *                        |                        |
> - *                        V                        V
> - *                .------...               .------...
> - *                |       |                |       |
> - *                V       V                V       V
> - *           .-> hdr     ...              ...     ...
> - *           |    |       |                |       |
> - *           |    V       V                V       V
> - *           |   succ <- cur              succ <- cur
> - *           |    |                        |
> - *           |    V                        V
> - *           |   ...                      ...
> - *           |    |                        |
> - *           '----'                       exit
> - *
> - * (C) successor state of cur is a part of some loop but this loop
> - *     does not include cur or successor state is not in a loop at all.
> - *
> - * Algorithm could be described as the following python code:
> - *
> - *     traversed = set()   # Set of traversed nodes
> - *     entries = {}        # Mapping from node to loop entry
> - *     depths = {}         # Depth level assigned to graph node
> - *     path = set()        # Current DFS path
> - *
> - *     # Find outermost loop entry known for n
> - *     def get_loop_entry(n):
> - *         h = entries.get(n, None)
> - *         while h in entries:
> - *             h = entries[h]
> - *         return h
> - *
> - *     # Update n's loop entry if h comes before n in current DFS path.
> - *     def update_loop_entry(n, h):
> - *         if h in path and depths[entries.get(n, n)] < depths[n]:
> - *             entries[n] = h1
> +static struct bpf_scc_info *insn_scc(struct bpf_verifier_env *env, int insn_idx)
> +{
> +       u32 scc;
> +
> +       scc = env->insn_aux_data[insn_idx].scc;
> +       return scc ? &env->scc_info[scc] : NULL;
> +}
> +
> +/* Returns true iff:
> + * - verifier is currently exploring states with origins in some CFG SCCs;
> + * - st->insn_idx belongs to one of these SCCs;
> + * - st->scc_epoch is the current SCC epoch, indicating that some parent
> + *   of st started current SCC exploration epoch.
>   *
> - *     def dfs(n, depth):
> - *         traversed.add(n)
> - *         path.add(n)
> - *         depths[n] = depth
> - *         for succ in G.successors(n):
> - *             if succ not in traversed:
> - *                 # Case A: explore succ and update cur's loop entry
> - *                 #         only if succ's entry is in current DFS path.
> - *                 dfs(succ, depth + 1)
> - *                 h = entries.get(succ, None)
> - *                 update_loop_entry(n, h)
> - *             else:
> - *                 # Case B or C depending on `h1 in path` check in update_loop_entry().
> - *                 update_loop_entry(n, succ)
> - *         path.remove(n)
> + * When above conditions are true, mark_all_regs_read_and_precise()
> + * has not yet been called for st, meaning that read and precision
> + * marks can't be relied upon.
>   *
> - * To adapt this algorithm for use with verifier:
> - * - use st->branch == 0 as a signal that DFS of succ had been finished
> - *   and cur's loop entry has to be updated (case A), handle this in
> - *   update_branch_counts();
> - * - use st->branch > 0 as a signal that st is in the current DFS path;
> - * - handle cases B and C in is_state_visited().
> + * See comments for mark_all_regs_read_and_precise().
>   */
> -static struct bpf_verifier_state *get_loop_entry(struct bpf_verifier_env *env,
> -                                                struct bpf_verifier_state *st)
> +static bool incomplete_read_marks(struct bpf_verifier_env *env,
> +                                 struct bpf_verifier_state *st)
>  {
> -       struct bpf_verifier_state *topmost = st->loop_entry;
> -       u32 steps = 0;
> +       struct bpf_scc_info *scc_info;
>
> -       while (topmost && topmost->loop_entry) {
> -               if (steps++ > st->dfs_depth) {
> -                       WARN_ONCE(true, "verifier bug: infinite loop in get_loop_entry\n");
> -                       verbose(env, "verifier bug: infinite loop in get_loop_entry()\n");
> -                       return ERR_PTR(-EFAULT);
> -               }
> -               topmost = topmost->loop_entry;
> -       }
> -       return topmost;
> +       scc_info = insn_scc(env, st->insn_idx);
> +       return scc_info &&
> +              scc_info->state_loops_possible &&
> +              scc_info->scc_epoch == st->scc_epoch &&
> +              scc_info->branches > 0;
>  }
>
> -static void update_loop_entry(struct bpf_verifier_env *env,
> -                             struct bpf_verifier_state *cur, struct bpf_verifier_state *hdr)
> +static void mark_state_loops_possible(struct bpf_verifier_env *env,
> +                                     struct bpf_verifier_state *st)
>  {
> -       /* The hdr->branches check decides between cases B and C in
> -        * comment for get_loop_entry(). If hdr->branches == 0 then
> -        * head's topmost loop entry is not in current DFS path,
> -        * hence 'cur' and 'hdr' are not in the same loop and there is
> -        * no need to update cur->loop_entry.
> -        */
> -       if (hdr->branches && hdr->dfs_depth < (cur->loop_entry ?: cur)->dfs_depth) {
> -               if (cur->loop_entry) {
> -                       cur->loop_entry->used_as_loop_entry--;
> -                       maybe_free_verifier_state(env, state_loop_entry_as_list(cur));
> -               }
> -               cur->loop_entry = hdr;
> -               hdr->used_as_loop_entry++;
> +       struct bpf_scc_info *scc_info;
> +
> +       scc_info = insn_scc(env, st->insn_idx);
> +       if (scc_info)
> +               scc_info->state_loops_possible = 1;
> +}
> +
> +/* See comments for bpf_scc_info->{branches,visit_count} and
> + * mark_all_regs_read_and_precise().
> + */
> +static void parent_scc_enter(struct bpf_verifier_env *env, struct bpf_verifier_state *st)
> +{
> +       struct bpf_scc_info *scc_info;
> +
> +       if (!st->parent)
> +               return;
> +       scc_info = insn_scc(env, st->parent->insn_idx);
> +       if (scc_info)
> +               scc_info->branches++;
> +}
> +
> +/* See comments for bpf_scc_info->{branches,visit_count} and
> + * mark_all_regs_read_and_precise().
> + */
> +static void parent_scc_exit(struct bpf_verifier_env *env, struct bpf_verifier_state *st)
> +{
> +       struct bpf_scc_info *scc_info;
> +
> +       if (!st->parent)
> +               return;
> +       scc_info = insn_scc(env, st->parent->insn_idx);
> +       if (scc_info) {
> +               WARN_ON_ONCE(scc_info->branches == 0);
> +               scc_info->branches--;
> +               if (scc_info->branches == 0)
> +                       scc_info->scc_epoch++;
>         }
>  }
>
> @@ -1960,14 +1861,6 @@ static void update_branch_counts(struct bpf_verifier_env *env, struct bpf_verifi
>         while (st) {
>                 u32 br = --st->branches;
>
> -               /* br == 0 signals that DFS exploration for 'st' is finished,
> -                * thus it is necessary to update parent's loop entry if it
> -                * turned out that st is a part of some loop.
> -                * This is a part of 'case A' in get_loop_entry() comment.
> -                */
> -               if (br == 0 && st->parent && st->loop_entry)
> -                       update_loop_entry(env, st->parent, st->loop_entry);
> -
>                 /* WARN_ON(br > 1) technically makes sense here,
>                  * but see comment in push_stack(), hence:
>                  */
> @@ -1976,6 +1869,7 @@ static void update_branch_counts(struct bpf_verifier_env *env, struct bpf_verifi
>                           br);
>                 if (br)
>                         break;
> +               parent_scc_exit(env, st);
>                 parent = st->parent;
>                 parent_sl = state_parent_as_list(st);
>                 if (sl)
> @@ -2053,6 +1947,7 @@ static struct bpf_verifier_state *push_stack(struct bpf_verifier_env *env,
>                  * which might have large 'branches' count.
>                  */
>         }
> +       parent_scc_enter(env, &elem->st);
>         return &elem->st;
>  err:
>         free_verifier_state(env->cur_state, true);
> @@ -2242,6 +2137,18 @@ static void __mark_reg64_unbounded(struct bpf_reg_state *reg)
>         reg->umax_value = U64_MAX;
>  }
>
> +static bool is_reg_unbounded(struct bpf_reg_state *reg)
> +{
> +       return reg->smin_value == S64_MIN &&
> +              reg->smax_value == S64_MAX &&
> +              reg->umin_value == 0 &&
> +              reg->umax_value == U64_MAX &&
> +              reg->s32_min_value == S32_MIN &&
> +              reg->s32_max_value == S32_MAX &&
> +              reg->u32_min_value == 0 &&
> +              reg->u32_max_value == U32_MAX;
> +}
> +
>  static void __mark_reg32_unbounded(struct bpf_reg_state *reg)
>  {
>         reg->s32_min_value = S32_MIN;
> @@ -18222,15 +18129,17 @@ static void clean_func_state(struct bpf_verifier_env *env,
>         }
>  }
>
> +static bool verifier_state_cleaned(struct bpf_verifier_state *st)
> +{
> +       /* all regs in this state in all frames were already marked */
> +       return st->frame[0]->regs[0].live & REG_LIVE_DONE;
> +}

move refactor into pre-patch.

> +
>  static void clean_verifier_state(struct bpf_verifier_env *env,
>                                  struct bpf_verifier_state *st)
>  {
>         int i;
>
> -       if (st->frame[0]->regs[0].live & REG_LIVE_DONE)
> -               /* all regs in this state in all frames were already marked */
> -               return;
> -
>         for (i = 0; i <= st->curframe; i++)
>                 clean_func_state(env, st->frame[i]);
>  }
> @@ -18243,6 +18152,114 @@ static u32 frame_insn_idx(struct bpf_verifier_state *st, u32 frame)
>                : st->frame[frame + 1]->callsite;
>  }
>
> +/* Open coded iterators introduce loops in the verifier state graph.
> + * State graph loops can result in incomplete read and precision marks
> + * on individual states. E.g. consider the following states graph:
> + *
> + *  .-> A --.  Assume the states are visited in the order A, B, C.
> + *  |   |   |  Assume that state B reaches a state equivalent to state A.
> + *  |   v   v  At this point, state C has not been processed yet,
> + *  '-- B   C  so state A does not have any read or precision marks from C yet.
> + *             As a result, these marks won't be propagated to B.
> + *
> + * If the marks on B are incomplete, it would be unsafe to use it in
> + * states_equal() checks.

earlier RANGE_WITHIN distinction was unnecessary,
but here we should clarify that
states_equal(NOT_EXACT) is unsafe,
while states_equal(RANGE_WITHIN) is fine.
Right?

> + *
> + * To avoid this safety issue, and since states with incomplete read
> + * marks can only occur within control flow graph loops, the verifier
> + * assumes that any state with bpf_verifier_state->insn_idx residing
> + * in a strongly connected component (SCC) has read and precision
> + * marks for all registers. This assumption is enforced by the
> + * function mark_all_regs_read_and_precise(), which assigns
> + * corresponding marks.
> + *
> + * An intuitive point to call mark_all_regs_read_and_precise() would
> + * be when a new state is created in is_state_visited().
> + * However, doing so would interfere with widen_imprecise_scalars(),
> + * which widens scalars in the current state after checking registers in a
> + * parent state. Registers are not widened if they are marked as precise
> + * in the parent state.
> + *
> + * To avoid interfering with widening logic,
> + * a call to mark_all_regs_read_and_precise() for state is postponed
> + * until no widening is possible in any descendant of state S.
> + *
> + * Another intuitive spot to call mark_all_regs_read_and_precise()
> + * would be in update_branch_counts() when S's branches counter
> + * reaches 0. However, this falls short in the following case:
> + *
> + *     sum = 0
> + *     bpf_repeat(10) {                              // a
> + *             if (unlikely(bpf_get_prandom_u32()))  // b
> + *                     sum += 1;
> + *             if (bpf_get_prandom_u32())            // c
> + *                     asm volatile ("");
> + *             asm volatile ("goto +0;");            // d
> + *     }
> + *
> + * Here a checkpoint is created at (d) with {sum=0} and the branch counter
> + * for (d) reaches 0, so 'sum' would be marked precise.
> + * When second branch of (c) reaches (d), checkpoint would be hit,
> + * and the precision mark for 'sum' propagated to (a).
> + * When the second branch of (b) reaches (a), the state would be {sum=1},
> + * no widening would occur, causing verification to continue forever.
> + *
> + * To avoid such premature precision markings, the verifier postpones
> + * the call to mark_all_regs_read_and_precise() for state S even further.
> + * Suppose state P is a [grand]parent of state S and is the first state
> + * in the current state chain with state->insn_idx within current SCC.
> + * mark_all_regs_read_and_precise() for state S is only called once P
> + * is fully explored.
> + *
> + * The struct 'bpf_scc_info' is used to track this condition:
> + * - bpf_scc_info->branches counts how many states currently
> + *   in env->cur_state or env->head originate from this SCC;

I'm still struggling with this definition of scc_info->branches
and this extra 'enter':

>         cur->insn_hist_start = cur->insn_hist_end;
>         cur->dfs_depth = new->dfs_depth + 1;
>         list_add(&new_sl->node, head);
> +       parent_scc_enter(env, env->cur_state);

since we don't do it for st->branches.

Can we make scc->branches symmetrical to st->branches ?
Only push_stack() will do scc_enter(),
and scc_exit() in update_branch_counts()
even if st->branches > 0.

If that's not possible let's pick a different name for scc_info->branches
to make it clear that the logic is quite different to st->branches.

> + * - bpf_scc_info->scc_epoch counts how many times 'branches'
> + *   has reached zero;
> + * - bpf_verifier_state->scc_epoch records the epoch of the SCC
> + *   corresponding to bpf_verifier_state->insn_idx at the moment
> + *   of state creation.
> + *
> + * Functions parent_scc_enter() and parent_scc_exit() maintain the
> + * bpf_scc_info->{branches,scc_epoch} counters.
> + *
> + * bpf_scc_info->branches reaching zero indicates that state P is
> + * fully explored. Its descendants residing in the same SCC have
> + * state->scc_epoch == scc_info->scc_epoch. parent_scc_exit()
> + * increments scc_info->scc_epoch, allowing clean_live_states() to
> + * detect these states and apply mark_all_regs_read_and_precise().
> + */