On Sat, Apr 26, 2025 at 3:46 AM Eduard Zingerman <eddyz87@xxxxxxxxx> wrote: > > Replace loop_entry-based exact states comparison logic. > Instead, for states within an iterator based loop, mark all registers > as read and precise. Use control flow graph strongly connected > components information to detect states that are members of a loop. > See comments for mark_all_regs_read_and_precise() for a detailed > explanation. > > This change addresses the cases described in [1]. > These cases can be illustrated with the following diagram: > > .-> A --. Assume the states are visited in the order A, B, C. > | | | Assume that state B reaches a state equivalent to state A. > | v v At this point, state C is not processed yet, so state A > '-- B C has not received any read or precision marks from C. > As a result, these marks won't be propagated to B. > > If B has incomplete marks, it is unsafe to use it in states_equal() > checks. > > See selftests later in a series for examples of unsafe programs that > are not detected without this change. > > [1] https://lore.kernel.org/bpf/3c6ac16b7578406e2ddd9ba889ce955748fe636b.camel@xxxxxxxxx/ > > Fixes: 2a0992829ea3 ("bpf: correct loop detection for iterators convergence") > Signed-off-by: Eduard Zingerman <eddyz87@xxxxxxxxx> > --- > include/linux/bpf_verifier.h | 42 ++-- > kernel/bpf/verifier.c | 440 ++++++++++++++++++----------------- > 2 files changed, 249 insertions(+), 233 deletions(-) > > diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h > index cb8e1ae67180..e718ef265a7b 100644 > --- a/include/linux/bpf_verifier.h > +++ b/include/linux/bpf_verifier.h > @@ -445,16 +445,6 @@ struct bpf_verifier_state { > /* first and last insn idx of this verifier state */ > u32 first_insn_idx; > u32 last_insn_idx; > - /* If this state is a part of states loop this field points to some > - * parent of this state such that: > - * - it is also a member of the same states loop; > - * - DFS states traversal starting from initial state visits loop_entry > - * state before this state. > - * Used to compute topmost loop entry for state loops. > - * State loops might appear because of open coded iterators logic. > - * See get_loop_entry() for more information. > - */ > - struct bpf_verifier_state *loop_entry; > /* Sub-range of env->insn_hist[] corresponding to this state's > * instruction history. > * Backtracking is using it to go from last to first. > @@ -466,11 +456,10 @@ struct bpf_verifier_state { > u32 dfs_depth; > u32 callback_unroll_depth; > u32 may_goto_depth; > - /* If this state was ever pointed-to by other state's loop_entry field > - * this flag would be set to true. Used to avoid freeing such states > - * while they are still in use. > + /* If this state is a checkpoint at insn_idx that belongs to an SCC, > + * record the SCC epoch at the time of checkpoint creation. > */ Please use normal kernel comment style for all new code: /* * multi- * line * comment */ > - u32 used_as_loop_entry; > + u32 scc_epoch; > }; > > #define bpf_get_spilled_reg(slot, frame, mask) \ > @@ -717,6 +706,29 @@ struct bpf_idset { > u32 ids[BPF_ID_MAP_SIZE]; > }; > > +/* Information tracked for CFG strongly connected components */ > +struct bpf_scc_info { > + /* True if states_equal(... RANGE_WITHIN) ever returned > + * true for a state with insn_idx within this SCC. > + * E.g. for iterator next call. I feel RANGE_WITHIN is unnecessary information here. Maybe reword it as: Set to true when is_state_visited() detected convergence of open coded iterator. ? > + * Indicates that read and precision marks are incomplete for > + * states with insn_idx in this SCC. > + */ > + u32 state_loops_possible:1; > + /* Number of verifier states with .branches > 0 that have > + * state->parent->insn_idx within this SCC. > + * In other words, the number of states originating from this > + * SCC that have not yet been fully explored. > + */ > + u32 branches:31; > + /* Number of times this SCC was entered by some verifier state > + * and that state was fully explored. > + * In other words, number of times .branches became non-zero > + * and then zero again. > + */ > + u32 scc_epoch; > +}; > + > /* single container for all structs > * one verifier_env per bpf_check() call > */ > @@ -809,6 +821,8 @@ struct bpf_verifier_env { > u64 prev_log_pos, prev_insn_print_pos; > /* buffer used to temporary hold constants as scalar registers */ > struct bpf_reg_state fake_reg[2]; > + struct bpf_scc_info *scc_info; > + u32 num_sccs; > /* buffer used to generate temporary string representations, > * e.g., in reg_type_str() to generate reg_type string > */ > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c > index 67903270b217..ae642459f342 100644 > --- a/kernel/bpf/verifier.c > +++ b/kernel/bpf/verifier.c > @@ -1672,7 +1672,7 @@ static void free_verifier_state(struct bpf_verifier_state *state, > kfree(state); > } > > -/* struct bpf_verifier_state->{parent,loop_entry} refer to states > +/* struct bpf_verifier_state->parent refers to states > * that are in either of env->{expored_states,free_list}. > * In both cases the state is contained in struct bpf_verifier_state_list. > */ > @@ -1683,36 +1683,18 @@ static struct bpf_verifier_state_list *state_parent_as_list(struct bpf_verifier_ > return NULL; > } > > -static struct bpf_verifier_state_list *state_loop_entry_as_list(struct bpf_verifier_state *st) > -{ > - if (st->loop_entry) > - return container_of(st->loop_entry, struct bpf_verifier_state_list, state); > - return NULL; > -} > - > /* A state can be freed if it is no longer referenced: > * - is in the env->free_list; > * - has no children states; > - * - is not used as loop_entry. > - * > - * Freeing a state can make it's loop_entry free-able. > */ > static void maybe_free_verifier_state(struct bpf_verifier_env *env, > struct bpf_verifier_state_list *sl) > { > - struct bpf_verifier_state_list *loop_entry_sl; > - > - while (sl && sl->in_free_list && > - sl->state.branches == 0 && > - sl->state.used_as_loop_entry == 0) { > - loop_entry_sl = state_loop_entry_as_list(&sl->state); > - if (loop_entry_sl) > - loop_entry_sl->state.used_as_loop_entry--; > + if (sl->in_free_list && sl->state.branches == 0) { > list_del(&sl->node); > free_verifier_state(&sl->state, false); > kfree(sl); > env->free_list_size--; > - sl = loop_entry_sl; > } > } > > @@ -1753,9 +1735,8 @@ static int copy_verifier_state(struct bpf_verifier_state *dst_state, > dst_state->insn_hist_end = src->insn_hist_end; > dst_state->dfs_depth = src->dfs_depth; > dst_state->callback_unroll_depth = src->callback_unroll_depth; > - dst_state->used_as_loop_entry = src->used_as_loop_entry; > dst_state->may_goto_depth = src->may_goto_depth; > - dst_state->loop_entry = src->loop_entry; > + dst_state->scc_epoch = src->scc_epoch; > for (i = 0; i <= src->curframe; i++) { > dst = dst_state->frame[i]; > if (!dst) { > @@ -1798,157 +1779,77 @@ static bool same_callsites(struct bpf_verifier_state *a, struct bpf_verifier_sta > return true; > } > > -/* Open coded iterators allow back-edges in the state graph in order to > - * check unbounded loops that iterators. > - * > - * In is_state_visited() it is necessary to know if explored states are > - * part of some loops in order to decide whether non-exact states > - * comparison could be used: > - * - non-exact states comparison establishes sub-state relation and uses > - * read and precision marks to do so, these marks are propagated from > - * children states and thus are not guaranteed to be final in a loop; > - * - exact states comparison just checks if current and explored states > - * are identical (and thus form a back-edge). > - * > - * Paper "A New Algorithm for Identifying Loops in Decompilation" > - * by Tao Wei, Jian Mao, Wei Zou and Yu Chen [1] presents a convenient > - * algorithm for loop structure detection and gives an overview of > - * relevant terminology. It also has helpful illustrations. > - * > - * [1] https://api.semanticscholar.org/CorpusID:15784067 > - * > - * We use a similar algorithm but because loop nested structure is > - * irrelevant for verifier ours is significantly simpler and resembles > - * strongly connected components algorithm from Sedgewick's textbook. > - * > - * Define topmost loop entry as a first node of the loop traversed in a > - * depth first search starting from initial state. The goal of the loop > - * tracking algorithm is to associate topmost loop entries with states > - * derived from these entries. > - * > - * For each step in the DFS states traversal algorithm needs to identify > - * the following situations: > - * > - * initial initial initial > - * | | | > - * V V V > - * ... ... .---------> hdr > - * | | | | > - * V V | V > - * cur .-> succ | .------... > - * | | | | | | > - * V | V | V V > - * succ '-- cur | ... ... > - * | | | > - * | V V > - * | succ <- cur > - * | | > - * | V > - * | ... > - * | | > - * '----' > - * > - * (A) successor state of cur (B) successor state of cur or it's entry > - * not yet traversed are in current DFS path, thus cur and succ > - * are members of the same outermost loop > - * > - * initial initial > - * | | > - * V V > - * ... ... > - * | | > - * V V > - * .------... .------... > - * | | | | > - * V V V V > - * .-> hdr ... ... ... > - * | | | | | > - * | V V V V > - * | succ <- cur succ <- cur > - * | | | > - * | V V > - * | ... ... > - * | | | > - * '----' exit > - * > - * (C) successor state of cur is a part of some loop but this loop > - * does not include cur or successor state is not in a loop at all. > - * > - * Algorithm could be described as the following python code: > - * > - * traversed = set() # Set of traversed nodes > - * entries = {} # Mapping from node to loop entry > - * depths = {} # Depth level assigned to graph node > - * path = set() # Current DFS path > - * > - * # Find outermost loop entry known for n > - * def get_loop_entry(n): > - * h = entries.get(n, None) > - * while h in entries: > - * h = entries[h] > - * return h > - * > - * # Update n's loop entry if h comes before n in current DFS path. > - * def update_loop_entry(n, h): > - * if h in path and depths[entries.get(n, n)] < depths[n]: > - * entries[n] = h1 > +static struct bpf_scc_info *insn_scc(struct bpf_verifier_env *env, int insn_idx) > +{ > + u32 scc; > + > + scc = env->insn_aux_data[insn_idx].scc; > + return scc ? &env->scc_info[scc] : NULL; > +} > + > +/* Returns true iff: > + * - verifier is currently exploring states with origins in some CFG SCCs; > + * - st->insn_idx belongs to one of these SCCs; > + * - st->scc_epoch is the current SCC epoch, indicating that some parent > + * of st started current SCC exploration epoch. > * > - * def dfs(n, depth): > - * traversed.add(n) > - * path.add(n) > - * depths[n] = depth > - * for succ in G.successors(n): > - * if succ not in traversed: > - * # Case A: explore succ and update cur's loop entry > - * # only if succ's entry is in current DFS path. > - * dfs(succ, depth + 1) > - * h = entries.get(succ, None) > - * update_loop_entry(n, h) > - * else: > - * # Case B or C depending on `h1 in path` check in update_loop_entry(). > - * update_loop_entry(n, succ) > - * path.remove(n) > + * When above conditions are true, mark_all_regs_read_and_precise() > + * has not yet been called for st, meaning that read and precision > + * marks can't be relied upon. > * > - * To adapt this algorithm for use with verifier: > - * - use st->branch == 0 as a signal that DFS of succ had been finished > - * and cur's loop entry has to be updated (case A), handle this in > - * update_branch_counts(); > - * - use st->branch > 0 as a signal that st is in the current DFS path; > - * - handle cases B and C in is_state_visited(). > + * See comments for mark_all_regs_read_and_precise(). > */ > -static struct bpf_verifier_state *get_loop_entry(struct bpf_verifier_env *env, > - struct bpf_verifier_state *st) > +static bool incomplete_read_marks(struct bpf_verifier_env *env, > + struct bpf_verifier_state *st) > { > - struct bpf_verifier_state *topmost = st->loop_entry; > - u32 steps = 0; > + struct bpf_scc_info *scc_info; > > - while (topmost && topmost->loop_entry) { > - if (steps++ > st->dfs_depth) { > - WARN_ONCE(true, "verifier bug: infinite loop in get_loop_entry\n"); > - verbose(env, "verifier bug: infinite loop in get_loop_entry()\n"); > - return ERR_PTR(-EFAULT); > - } > - topmost = topmost->loop_entry; > - } > - return topmost; > + scc_info = insn_scc(env, st->insn_idx); > + return scc_info && > + scc_info->state_loops_possible && > + scc_info->scc_epoch == st->scc_epoch && > + scc_info->branches > 0; > } > > -static void update_loop_entry(struct bpf_verifier_env *env, > - struct bpf_verifier_state *cur, struct bpf_verifier_state *hdr) > +static void mark_state_loops_possible(struct bpf_verifier_env *env, > + struct bpf_verifier_state *st) > { > - /* The hdr->branches check decides between cases B and C in > - * comment for get_loop_entry(). If hdr->branches == 0 then > - * head's topmost loop entry is not in current DFS path, > - * hence 'cur' and 'hdr' are not in the same loop and there is > - * no need to update cur->loop_entry. > - */ > - if (hdr->branches && hdr->dfs_depth < (cur->loop_entry ?: cur)->dfs_depth) { > - if (cur->loop_entry) { > - cur->loop_entry->used_as_loop_entry--; > - maybe_free_verifier_state(env, state_loop_entry_as_list(cur)); > - } > - cur->loop_entry = hdr; > - hdr->used_as_loop_entry++; > + struct bpf_scc_info *scc_info; > + > + scc_info = insn_scc(env, st->insn_idx); > + if (scc_info) > + scc_info->state_loops_possible = 1; > +} > + > +/* See comments for bpf_scc_info->{branches,visit_count} and > + * mark_all_regs_read_and_precise(). > + */ > +static void parent_scc_enter(struct bpf_verifier_env *env, struct bpf_verifier_state *st) > +{ > + struct bpf_scc_info *scc_info; > + > + if (!st->parent) > + return; > + scc_info = insn_scc(env, st->parent->insn_idx); > + if (scc_info) > + scc_info->branches++; > +} > + > +/* See comments for bpf_scc_info->{branches,visit_count} and > + * mark_all_regs_read_and_precise(). > + */ > +static void parent_scc_exit(struct bpf_verifier_env *env, struct bpf_verifier_state *st) > +{ > + struct bpf_scc_info *scc_info; > + > + if (!st->parent) > + return; > + scc_info = insn_scc(env, st->parent->insn_idx); > + if (scc_info) { > + WARN_ON_ONCE(scc_info->branches == 0); > + scc_info->branches--; > + if (scc_info->branches == 0) > + scc_info->scc_epoch++; > } > } > > @@ -1960,14 +1861,6 @@ static void update_branch_counts(struct bpf_verifier_env *env, struct bpf_verifi > while (st) { > u32 br = --st->branches; > > - /* br == 0 signals that DFS exploration for 'st' is finished, > - * thus it is necessary to update parent's loop entry if it > - * turned out that st is a part of some loop. > - * This is a part of 'case A' in get_loop_entry() comment. > - */ > - if (br == 0 && st->parent && st->loop_entry) > - update_loop_entry(env, st->parent, st->loop_entry); > - > /* WARN_ON(br > 1) technically makes sense here, > * but see comment in push_stack(), hence: > */ > @@ -1976,6 +1869,7 @@ static void update_branch_counts(struct bpf_verifier_env *env, struct bpf_verifi > br); > if (br) > break; > + parent_scc_exit(env, st); > parent = st->parent; > parent_sl = state_parent_as_list(st); > if (sl) > @@ -2053,6 +1947,7 @@ static struct bpf_verifier_state *push_stack(struct bpf_verifier_env *env, > * which might have large 'branches' count. > */ > } > + parent_scc_enter(env, &elem->st); > return &elem->st; > err: > free_verifier_state(env->cur_state, true); > @@ -2242,6 +2137,18 @@ static void __mark_reg64_unbounded(struct bpf_reg_state *reg) > reg->umax_value = U64_MAX; > } > > +static bool is_reg_unbounded(struct bpf_reg_state *reg) > +{ > + return reg->smin_value == S64_MIN && > + reg->smax_value == S64_MAX && > + reg->umin_value == 0 && > + reg->umax_value == U64_MAX && > + reg->s32_min_value == S32_MIN && > + reg->s32_max_value == S32_MAX && > + reg->u32_min_value == 0 && > + reg->u32_max_value == U32_MAX; > +} > + > static void __mark_reg32_unbounded(struct bpf_reg_state *reg) > { > reg->s32_min_value = S32_MIN; > @@ -18222,15 +18129,17 @@ static void clean_func_state(struct bpf_verifier_env *env, > } > } > > +static bool verifier_state_cleaned(struct bpf_verifier_state *st) > +{ > + /* all regs in this state in all frames were already marked */ > + return st->frame[0]->regs[0].live & REG_LIVE_DONE; > +} move refactor into pre-patch. > + > static void clean_verifier_state(struct bpf_verifier_env *env, > struct bpf_verifier_state *st) > { > int i; > > - if (st->frame[0]->regs[0].live & REG_LIVE_DONE) > - /* all regs in this state in all frames were already marked */ > - return; > - > for (i = 0; i <= st->curframe; i++) > clean_func_state(env, st->frame[i]); > } > @@ -18243,6 +18152,114 @@ static u32 frame_insn_idx(struct bpf_verifier_state *st, u32 frame) > : st->frame[frame + 1]->callsite; > } > > +/* Open coded iterators introduce loops in the verifier state graph. > + * State graph loops can result in incomplete read and precision marks > + * on individual states. E.g. consider the following states graph: > + * > + * .-> A --. Assume the states are visited in the order A, B, C. > + * | | | Assume that state B reaches a state equivalent to state A. > + * | v v At this point, state C has not been processed yet, > + * '-- B C so state A does not have any read or precision marks from C yet. > + * As a result, these marks won't be propagated to B. > + * > + * If the marks on B are incomplete, it would be unsafe to use it in > + * states_equal() checks. earlier RANGE_WITHIN distinction was unnecessary, but here we should clarify that states_equal(NOT_EXACT) is unsafe, while states_equal(RANGE_WITHIN) is fine. Right? > + * > + * To avoid this safety issue, and since states with incomplete read > + * marks can only occur within control flow graph loops, the verifier > + * assumes that any state with bpf_verifier_state->insn_idx residing > + * in a strongly connected component (SCC) has read and precision > + * marks for all registers. This assumption is enforced by the > + * function mark_all_regs_read_and_precise(), which assigns > + * corresponding marks. > + * > + * An intuitive point to call mark_all_regs_read_and_precise() would > + * be when a new state is created in is_state_visited(). > + * However, doing so would interfere with widen_imprecise_scalars(), > + * which widens scalars in the current state after checking registers in a > + * parent state. Registers are not widened if they are marked as precise > + * in the parent state. > + * > + * To avoid interfering with widening logic, > + * a call to mark_all_regs_read_and_precise() for state is postponed > + * until no widening is possible in any descendant of state S. > + * > + * Another intuitive spot to call mark_all_regs_read_and_precise() > + * would be in update_branch_counts() when S's branches counter > + * reaches 0. However, this falls short in the following case: > + * > + * sum = 0 > + * bpf_repeat(10) { // a > + * if (unlikely(bpf_get_prandom_u32())) // b > + * sum += 1; > + * if (bpf_get_prandom_u32()) // c > + * asm volatile (""); > + * asm volatile ("goto +0;"); // d > + * } > + * > + * Here a checkpoint is created at (d) with {sum=0} and the branch counter > + * for (d) reaches 0, so 'sum' would be marked precise. > + * When second branch of (c) reaches (d), checkpoint would be hit, > + * and the precision mark for 'sum' propagated to (a). > + * When the second branch of (b) reaches (a), the state would be {sum=1}, > + * no widening would occur, causing verification to continue forever. > + * > + * To avoid such premature precision markings, the verifier postpones > + * the call to mark_all_regs_read_and_precise() for state S even further. > + * Suppose state P is a [grand]parent of state S and is the first state > + * in the current state chain with state->insn_idx within current SCC. > + * mark_all_regs_read_and_precise() for state S is only called once P > + * is fully explored. > + * > + * The struct 'bpf_scc_info' is used to track this condition: > + * - bpf_scc_info->branches counts how many states currently > + * in env->cur_state or env->head originate from this SCC; I'm still struggling with this definition of scc_info->branches and this extra 'enter': > cur->insn_hist_start = cur->insn_hist_end; > cur->dfs_depth = new->dfs_depth + 1; > list_add(&new_sl->node, head); > + parent_scc_enter(env, env->cur_state); since we don't do it for st->branches. Can we make scc->branches symmetrical to st->branches ? Only push_stack() will do scc_enter(), and scc_exit() in update_branch_counts() even if st->branches > 0. If that's not possible let's pick a different name for scc_info->branches to make it clear that the logic is quite different to st->branches. > + * - bpf_scc_info->scc_epoch counts how many times 'branches' > + * has reached zero; > + * - bpf_verifier_state->scc_epoch records the epoch of the SCC > + * corresponding to bpf_verifier_state->insn_idx at the moment > + * of state creation. > + * > + * Functions parent_scc_enter() and parent_scc_exit() maintain the > + * bpf_scc_info->{branches,scc_epoch} counters. > + * > + * bpf_scc_info->branches reaching zero indicates that state P is > + * fully explored. Its descendants residing in the same SCC have > + * state->scc_epoch == scc_info->scc_epoch. parent_scc_exit() > + * increments scc_info->scc_epoch, allowing clean_live_states() to > + * detect these states and apply mark_all_regs_read_and_precise(). > + */