Re: [syzbot] [bpf?] WARNING in reg_bounds_sanity_check

Eduard Zingerman <eddyz87@xxxxxxxxx> · Fri, 04 Jul 2025 10:26:14 -0700

On Fri, 2025-07-04 at 19:14 +0200, Paul Chaignon wrote:
> On Thu, Jul 03, 2025 at 11:54:27AM -0700, Eduard Zingerman wrote:
> > On Thu, 2025-07-03 at 19:02 +0200, Paul Chaignon wrote:
> > > On Tue, Jul 01, 2025 at 06:55:28PM -0700, syzbot wrote:
> > > > Hello,
> > > > 
> > > > syzbot found the following issue on:
> > > > 
> > > > HEAD commit:    cce3fee729ee selftests/bpf: Enable dynptr/test_probe_read_..
> > > > git tree:       bpf-next
> > > > console+strace: https://syzkaller.appspot.com/x/log.txt?x=147793d4580000
> > > > kernel config:  https://syzkaller.appspot.com/x/.config?x=79da270cec5ffd65
> > > > dashboard link: https://syzkaller.appspot.com/bug?extid=c711ce17dd78e5d4fdcf
> > > > compiler:       Debian clang version 20.1.6 (++20250514063057+1e4d39e07757-1~exp1~20250514183223.118), Debian LLD 20.1.6
> > > > syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=1594e48c580000
> > > > C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=1159388c580000
> > > > 
> > > > Downloadable assets:
> > > > disk image: https://storage.googleapis.com/syzbot-assets/f286a7ef4940/disk-cce3fee7.raw.xz
> > > > vmlinux: https://storage.googleapis.com/syzbot-assets/e2f2ebe1fdc3/vmlinux-cce3fee7.xz
> > > > kernel image: https://storage.googleapis.com/syzbot-assets/6e3070663778/bzImage-cce3fee7.xz
> > > > 
> > > > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > > > Reported-by: syzbot+c711ce17dd78e5d4fdcf@xxxxxxxxxxxxxxxxxxxxxxxxx
> > > > 
> > > > ------------[ cut here ]------------
> > > > verifier bug: REG INVARIANTS VIOLATION (false_reg1): range bounds violation u64=[0x0, 0x0] s64=[0x0, 0x0] u32=[0x1, 0x0] s32=[0x0, 0x0] var_off=(0x0, 0x0)(1)
> > > > WARNING: CPU: 1 PID: 5833 at kernel/bpf/verifier.c:2688 reg_bounds_sanity_check+0x6e6/0xc20 kernel/bpf/verifier.c:2682
> > > 
> > > I'm unsure how to handle this one.
> > > 
> > > One example repro is as follows.
> > > 
> > >   0: call bpf_get_netns_cookie
> > >   1: if r0 == 0 goto <exit>
> > >   2: if r0 & Oxffffffff goto <exit>
> > > 
> > > The issue is on the path where we fall through both jumps.
> > > 
> > > That path is unreachable at runtime: after insn 1, we know r0 != 0, but
> > > with the sign extension on the jset, we would only fallthrough insn 2
> > > if r0 == 0. Unfortunately, is_branch_taken() isn't currently able to
> > > figure this out, so the verifier walks all branches. As a result, we end
> > > up with inconsistent register ranges on this unreachable path:
> > > 
> > >   0: if r0 == 0 goto <exit>
> > >     r0: u64=[0x1, 0xffffffffffffffff] var_off=(0, 0xffffffffffffffff)
> > >   1: if r0 & 0xffffffff goto <exit>
> > >     r0 before reg_bounds_sync: u64=[0x1, 0xffffffffffffffff] var_off=(0, 0)
> > >     r0 after reg_bounds_sync:  u64=[0x1, 0] var_off=(0, 0)
> > > 
> > > I suspect there isn't anything specific to these two conditions, and
> > > anytime we start walking an unreachable path, we may end up with
> > > inconsistent register ranges. The number of times syzkaller is currently
> > > hitting this (180 in 1.5 days) suggests there are many different ways to
> > > reproduce.
> > > 
> > > We could teach is_branch_taken() about this case, but we probably won't
> > > be able to cover all cases. We could stop warning on this, but then we
> > > may also miss legitimate cases (i.e., invariants violations on reachable
> > > paths). We could also teach reg_bounds_sync() to stop refining the
> > > bounds before it gets inconsistent, but I'm unsure how useful that'd be.
> > 
> > Hi Paul,
> > 
> > In general, I think that reg_bounds_sync() can be used as a substitute
> > for is_branch_taken() -> whenever an impossible range is produced,
> > the branch should be deemed impossible at runtime and abandoned.
> > If I recall correctly Andrii considered this too risky some time ago,
> > so this warning is in place to catch bugs.
> 
> Hi Eduard,
> 
> Yeah, that feels risky enough that I didn't even dare mention it as a
> possibility :)
> 
> > 
> > Which leaves only the option to refine is_branch_taken().
> > 
> > I think is_branch_taken() modification should not be too complicated.
> > For JSET it only checks tnum, but does not take ranges into account.
> > Reasoning about ranges is something along the lines:
> > - for unsigned range a = b & CONST -> a is in [b_min & CONST, b_max & CONST];
> > - for signed ranged same thing, but consider two unsigned sub-ranges;
> > - for non CONST cases, I think same reasoning can apply, but more
> >   min/max combinations need to be explored.
> > - then check if zero is a member or 'a' range.
> > 
> > Wdyt?
> 
> I might be missing something, but I'm not sure that works. For the
> unsigned range, if we have b & 0x2 with b in [2; 10], then we'd end up
> with a in [2; 2] and would conclude that the jump is never taken. But
> b=8 proves us wrong.

I see, what is really needed is an 'or' joined mask of all 'b' values.
I need to think how that can be obtained (or approximated).

> > 
> > > The number of times syzkaller is currently hitting this (180 in 1.5
> > > days) suggests there are many different ways to reproduce.
> > 
> > It is a bit inconvenient to read syzbot BPF reports at the moment,
> > because it us hard to figure out how the program looks like.
> > Do you happen to know how complicated would it be to modify syzbot
> > output to:
> > - produce a comment with BPF program
> > - generating reproducer with a flag, allowing to print level 2
> >   verifier log
> > ?
> 
> I have the same thought sometimes. Right now, I add verifier logs to a
> syz or C reproducer to see the program. Producing the BPF program in a
> comment would likely be tricky as we'd need to maintain a disassembler
> in syzkaller.

So, it operates on raw bytes, not on logical instructions?

> Adding verifier logs to reproducers that contain bpf(PROG_LOAD)
> calls seems easier. Then I guess we'd get that output in the strace
> or console logs of syzbot.

The log level 2 might be huge, so it shouldn't be enabled by default.
But not having to modify the reproducer before investigation would be
helpful.