[PATCH bpf-next v1 00/11] bpf: propagate read/precision marks over state graph backedges

Eduard Zingerman <eddyz87@xxxxxxxxx> · Sat, 24 May 2025 12:19:21 -0700

Current loop_entry-based states comparison logic does not handle the
following case:

 .-> A --.  Assume the states are visited in the order A, B, C.
 |   |   |  Assume that state B reaches a state equivalent to state A.
 |   v   v  At this point, state C is not processed yet, so state A
 '-- B   C  has not received any read or precision marks from C.
            As a result, these marks won't be propagated to B.

If B has incomplete marks, it is unsafe to use it in states_equal()
checks. This issue was first reported in [1].

This patch-set
--------------

Here is the gist of the algorithm implemented by this patch-set:
- Compute strongly connected components (SCCs) in the program CFG.
- When a verifier state enters an SCC, that state is recorded as the
  SCC's entry point.
- When a verifier state is found to be equivalent to another
  (e.g., B to A in the example above), it is recorded as a
  states-graph backedge.
- Backedges are accumulated per SCC (*).
- When an SCC entry state reaches `branches == 0`, propagate read and
  precision marks through the backedges until a fixed point is reached
  (e.g., from A to B, from C to A, and then again from A to B).

(*) This is an oversimplification, see patch #8 for details.

Unfortunately, this means that commit [2] needs to be reverted,
as precision propagation requires access to jump history,
and backedges represent history not belonging to `env->cur_state`.

Details are provided in patch #8; a comment in `is_state_visited()`
explains most of the mechanics.

Patch #2 adds a `compute_scc()` function, which computes SCCs in the
program CFG. This function was tested using property-based testing in
[3], but it is not included in selftests.

Previous attempt
----------------

A previous attempt to fix this is described in [4]:
1. Within the states loop, `states_equal(... RANGE_WITHIN)` ignores
   read and precision marks.
2. For states outside the loop, all registers for states within the
   loop are marked as read and precise.

This approach led to an 86x regression on the `cond_break1` selftest.
In that test, one loop was followed by another, and a certain variable
was incremented in the second loop. This variable was marked as
precise due to rule (2), which hindered convergence in the first loop.

After some off-list discussion, it was decided that this might be a
typical case and such regressions are undesirable.

This patch-set avoids such eager precision markings.

Alternatives
------------

Another option is to associate a mask of read/written/precise stack
slots with each instruction. This mask can be populated during
verifier states exploration. Upon reaching an `EXIT` instruction or an
equivalent state, the accumulated masks can be used to propagate
read/written/precise bits across the program's control flow graph
using an analysis similar to use-def.

Unfortunately, a naive implementation of this approach [5] results in
a 10x regression in `veristat` for some `sched_ext` programs due to
the inability to express the must-write property. This issue requires
further investigation.

Veristat changes
----------------

There are some veristat regressions when comparing with master using
selftests and sched_ext BPF binaries.

========= selftests: master vs patch-set =========

File                                Program                            Insns (A)  Insns (B)  Insns     (DIFF)
----------------------------------  ---------------------------------  ---------  ---------  ----------------
arena_list.bpf.o                    arena_list_add                           374        406      +32 (+8.56%)
dynptr_success.bpf.o                test_copy_from_user_dynptr               497        498       +1 (+0.20%)
dynptr_success.bpf.o                test_copy_from_user_str_dynptr           268        284      +16 (+5.97%)
dynptr_success.bpf.o                test_probe_read_kernel_dynptr            994       1101    +107 (+10.76%)
dynptr_success.bpf.o                test_probe_read_kernel_str_dynptr        927        965      +38 (+4.10%)
dynptr_success.bpf.o                test_probe_read_user_dynptr             1000       1107    +107 (+10.70%)
dynptr_success.bpf.o                test_probe_read_user_str_dynptr          930        968      +38 (+4.09%)
iters.bpf.o                         checkpoint_states_deletion              1211       1216       +5 (+0.41%)
iters.bpf.o                         clean_live_states                        588        620      +32 (+5.44%)
iters.bpf.o                         iter_subprog_check_stacksafe             128        136       +8 (+6.25%)
pyperf600_iter.bpf.o                on_event                                2591       5929  +3338 (+128.83%)
test_usdt.bpf.o                     usdt12                                  1803       1860      +57 (+3.16%)
verifier_iterating_callbacks.bpf.o  cond_break2                               90        110     +20 (+22.22%)

Total progs: 3597
Old success: 2081
New success: 2081
States diff min:    0.00%
States diff max:  121.88%
   0 .. 5    %: 3589
   5 .. 15   %: 4
  15 .. 25   %: 2
  30 .. 40   %: 1
 120 .. 125  %: 1

========= sched_ext: master vs patch-set =========

File       Program                Insns (A)  Insns (B)  Insns    (DIFF)
---------  ---------------------  ---------  ---------  ---------------
bpf.bpf.o  bpfland_init                 975       1012     +37 (+3.79%)
bpf.bpf.o  chaos_dispatch              4229       4259     +30 (+0.71%)
bpf.bpf.o  chaos_init                  3462       3819   +357 (+10.31%)
bpf.bpf.o  lavd_cpu_offline            5063       5695   +632 (+12.48%)
bpf.bpf.o  lavd_cpu_online             5063       5695   +632 (+12.48%)
bpf.bpf.o  lavd_dispatch              41769      47578  +5809 (+13.91%)
bpf.bpf.o  lavd_enqueue               24190      27749  +3559 (+14.71%)
bpf.bpf.o  lavd_init                   6748       7474   +726 (+10.76%)
bpf.bpf.o  lavd_select_cpu            27243      30802  +3559 (+13.06%)
bpf.bpf.o  layered_dispatch            8909      13295  +4386 (+49.23%)
bpf.bpf.o  layered_dump                1890       2097   +207 (+10.95%)
bpf.bpf.o  layered_enqueue             8154       8207     +53 (+0.65%)
bpf.bpf.o  layered_init                3890       4274    +384 (+9.87%)
bpf.bpf.o  layered_runnable            1834       1868     +34 (+1.85%)
bpf.bpf.o  p2dq_dispatch               1057       1086     +29 (+2.74%)
bpf.bpf.o  p2dq_dispatch               1057       1086     +29 (+2.74%)
bpf.bpf.o  p2dq_init                   3067       3418   +351 (+11.44%)
bpf.bpf.o  p2dq_init                   3067       3418   +351 (+11.44%)
bpf.bpf.o  rusty_init                 35707      35766     +59 (+0.17%)
bpf.bpf.o  tp_cgroup_attach_task        149        203    +54 (+36.24%)

Total progs: 147
Old success: 134
New success: 134
States diff min:    0.00%
States diff max:   45.09%
   0 .. 5    %: 138
   5 .. 15   %: 5
  15 .. 25   %: 2
  25 .. 35   %: 1
  45 .. 50   %: 1

[1] https://lore.kernel.org/bpf/20250312031344.3735498-1-eddyz87@xxxxxxxxx/
[2] commit 96a30e469ca1 ("bpf: use common instruction history across all states")
[3] https://github.com/eddyz87/scc-test
[4] https://lore.kernel.org/bpf/20250426104634.744077-1-eddyz87@xxxxxxxxx/
[5] https://github.com/eddyz87/bpf/tree/propagate-read-and-precision-in-cfg

Eduard Zingerman (11):
  Revert "bpf: use common instruction history across all states"
  bpf: compute SCCs in program control flow graph
  bpf: frame_insn_idx() utility function
  bpf: starting_state parameter for __mark_chain_precision()
  bpf: set 'changed' status if propagate_precision() did any updates
  bpf: set 'changed' status if propagate_liveness() did any updates
  bpf: move REG_LIVE_DONE check to clean_live_states()
  bpf: propagate read/precision marks over state graph backedges
  bpf: remove {update,get}_loop_entry functions
  bpf: include backedges in peak_states stat
  selftests/bpf: tests with a loop state missing read/precision mark

 include/linux/bpf_verifier.h              |  77 +-
 kernel/bpf/verifier.c                     | 968 +++++++++++++++-------
 tools/testing/selftests/bpf/progs/iters.c | 277 +++++++
 3 files changed, 997 insertions(+), 325 deletions(-)

-- 
2.48.1