Current loop_entry-based states comparison logic does not handle the following case: .-> A --. Assume the states are visited in the order A, B, C. | | | Assume that state B reaches a state equivalent to state A. | v v At this point, state C is not processed yet, so state A '-- B C has not received any read or precision marks from C. As a result, these marks won't be propagated to B. If B has incomplete marks, it is unsafe to use it in states_equal() checks. This issue was first reported in [1]. This patch-set -------------- Here is the gist of the algorithm implemented by this patch-set: - Compute strongly connected components (SCCs) in the program CFG. - When a verifier state enters an SCC, that state is recorded as the SCC's entry point. - When a verifier state is found to be equivalent to another (e.g., B to A in the example above), it is recorded as a states-graph backedge. - Backedges are accumulated per SCC (*). - When an SCC entry state reaches `branches == 0`, propagate read and precision marks through the backedges until a fixed point is reached (e.g., from A to B, from C to A, and then again from A to B). (*) This is an oversimplification, see patch #8 for details. Unfortunately, this means that commit [2] needs to be reverted, as precision propagation requires access to jump history, and backedges represent history not belonging to `env->cur_state`. Details are provided in patch #8; a comment in `is_state_visited()` explains most of the mechanics. Patch #2 adds a `compute_scc()` function, which computes SCCs in the program CFG. This function was tested using property-based testing in [3], but it is not included in selftests. Previous attempt ---------------- A previous attempt to fix this is described in [4]: 1. Within the states loop, `states_equal(... RANGE_WITHIN)` ignores read and precision marks. 2. For states outside the loop, all registers for states within the loop are marked as read and precise. This approach led to an 86x regression on the `cond_break1` selftest. In that test, one loop was followed by another, and a certain variable was incremented in the second loop. This variable was marked as precise due to rule (2), which hindered convergence in the first loop. After some off-list discussion, it was decided that this might be a typical case and such regressions are undesirable. This patch-set avoids such eager precision markings. Alternatives ------------ Another option is to associate a mask of read/written/precise stack slots with each instruction. This mask can be populated during verifier states exploration. Upon reaching an `EXIT` instruction or an equivalent state, the accumulated masks can be used to propagate read/written/precise bits across the program's control flow graph using an analysis similar to use-def. Unfortunately, a naive implementation of this approach [5] results in a 10x regression in `veristat` for some `sched_ext` programs due to the inability to express the must-write property. This issue requires further investigation. Veristat changes ---------------- There are some veristat regressions when comparing with master using selftests and sched_ext BPF binaries. ========= selftests: master vs patch-set ========= File Program Insns (A) Insns (B) Insns (DIFF) ---------------------------------- --------------------------------- --------- --------- ---------------- arena_list.bpf.o arena_list_add 374 406 +32 (+8.56%) dynptr_success.bpf.o test_copy_from_user_dynptr 497 498 +1 (+0.20%) dynptr_success.bpf.o test_copy_from_user_str_dynptr 268 284 +16 (+5.97%) dynptr_success.bpf.o test_probe_read_kernel_dynptr 994 1101 +107 (+10.76%) dynptr_success.bpf.o test_probe_read_kernel_str_dynptr 927 965 +38 (+4.10%) dynptr_success.bpf.o test_probe_read_user_dynptr 1000 1107 +107 (+10.70%) dynptr_success.bpf.o test_probe_read_user_str_dynptr 930 968 +38 (+4.09%) iters.bpf.o checkpoint_states_deletion 1211 1216 +5 (+0.41%) iters.bpf.o clean_live_states 588 620 +32 (+5.44%) iters.bpf.o iter_subprog_check_stacksafe 128 136 +8 (+6.25%) pyperf600_iter.bpf.o on_event 2591 5929 +3338 (+128.83%) test_usdt.bpf.o usdt12 1803 1860 +57 (+3.16%) verifier_iterating_callbacks.bpf.o cond_break2 90 110 +20 (+22.22%) Total progs: 3597 Old success: 2081 New success: 2081 States diff min: 0.00% States diff max: 121.88% 0 .. 5 %: 3589 5 .. 15 %: 4 15 .. 25 %: 2 30 .. 40 %: 1 120 .. 125 %: 1 ========= sched_ext: master vs patch-set ========= File Program Insns (A) Insns (B) Insns (DIFF) --------- --------------------- --------- --------- --------------- bpf.bpf.o bpfland_init 975 1012 +37 (+3.79%) bpf.bpf.o chaos_dispatch 4229 4259 +30 (+0.71%) bpf.bpf.o chaos_init 3462 3819 +357 (+10.31%) bpf.bpf.o lavd_cpu_offline 5063 5695 +632 (+12.48%) bpf.bpf.o lavd_cpu_online 5063 5695 +632 (+12.48%) bpf.bpf.o lavd_dispatch 41769 47578 +5809 (+13.91%) bpf.bpf.o lavd_enqueue 24190 27749 +3559 (+14.71%) bpf.bpf.o lavd_init 6748 7474 +726 (+10.76%) bpf.bpf.o lavd_select_cpu 27243 30802 +3559 (+13.06%) bpf.bpf.o layered_dispatch 8909 13295 +4386 (+49.23%) bpf.bpf.o layered_dump 1890 2097 +207 (+10.95%) bpf.bpf.o layered_enqueue 8154 8207 +53 (+0.65%) bpf.bpf.o layered_init 3890 4274 +384 (+9.87%) bpf.bpf.o layered_runnable 1834 1868 +34 (+1.85%) bpf.bpf.o p2dq_dispatch 1057 1086 +29 (+2.74%) bpf.bpf.o p2dq_dispatch 1057 1086 +29 (+2.74%) bpf.bpf.o p2dq_init 3067 3418 +351 (+11.44%) bpf.bpf.o p2dq_init 3067 3418 +351 (+11.44%) bpf.bpf.o rusty_init 35707 35766 +59 (+0.17%) bpf.bpf.o tp_cgroup_attach_task 149 203 +54 (+36.24%) Total progs: 147 Old success: 134 New success: 134 States diff min: 0.00% States diff max: 45.09% 0 .. 5 %: 138 5 .. 15 %: 5 15 .. 25 %: 2 25 .. 35 %: 1 45 .. 50 %: 1 [1] https://lore.kernel.org/bpf/20250312031344.3735498-1-eddyz87@xxxxxxxxx/ [2] commit 96a30e469ca1 ("bpf: use common instruction history across all states") [3] https://github.com/eddyz87/scc-test [4] https://lore.kernel.org/bpf/20250426104634.744077-1-eddyz87@xxxxxxxxx/ [5] https://github.com/eddyz87/bpf/tree/propagate-read-and-precision-in-cfg Eduard Zingerman (11): Revert "bpf: use common instruction history across all states" bpf: compute SCCs in program control flow graph bpf: frame_insn_idx() utility function bpf: starting_state parameter for __mark_chain_precision() bpf: set 'changed' status if propagate_precision() did any updates bpf: set 'changed' status if propagate_liveness() did any updates bpf: move REG_LIVE_DONE check to clean_live_states() bpf: propagate read/precision marks over state graph backedges bpf: remove {update,get}_loop_entry functions bpf: include backedges in peak_states stat selftests/bpf: tests with a loop state missing read/precision mark include/linux/bpf_verifier.h | 77 +- kernel/bpf/verifier.c | 968 +++++++++++++++------- tools/testing/selftests/bpf/progs/iters.c | 277 +++++++ 3 files changed, 997 insertions(+), 325 deletions(-) -- 2.48.1