On Fri, 2025-07-04 at 19:14 +0200, Paul Chaignon wrote: > On Thu, Jul 03, 2025 at 11:54:27AM -0700, Eduard Zingerman wrote: > > On Thu, 2025-07-03 at 19:02 +0200, Paul Chaignon wrote: > > > On Tue, Jul 01, 2025 at 06:55:28PM -0700, syzbot wrote: > > > > Hello, > > > > > > > > syzbot found the following issue on: > > > > > > > > HEAD commit: cce3fee729ee selftests/bpf: Enable dynptr/test_probe_read_.. > > > > git tree: bpf-next > > > > console+strace: https://syzkaller.appspot.com/x/log.txt?x=147793d4580000 > > > > kernel config: https://syzkaller.appspot.com/x/.config?x=79da270cec5ffd65 > > > > dashboard link: https://syzkaller.appspot.com/bug?extid=c711ce17dd78e5d4fdcf > > > > compiler: Debian clang version 20.1.6 (++20250514063057+1e4d39e07757-1~exp1~20250514183223.118), Debian LLD 20.1.6 > > > > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1594e48c580000 > > > > C reproducer: https://syzkaller.appspot.com/x/repro.c?x=1159388c580000 > > > > > > > > Downloadable assets: > > > > disk image: https://storage.googleapis.com/syzbot-assets/f286a7ef4940/disk-cce3fee7.raw.xz > > > > vmlinux: https://storage.googleapis.com/syzbot-assets/e2f2ebe1fdc3/vmlinux-cce3fee7.xz > > > > kernel image: https://storage.googleapis.com/syzbot-assets/6e3070663778/bzImage-cce3fee7.xz > > > > > > > > IMPORTANT: if you fix the issue, please add the following tag to the commit: > > > > Reported-by: syzbot+c711ce17dd78e5d4fdcf@xxxxxxxxxxxxxxxxxxxxxxxxx > > > > > > > > ------------[ cut here ]------------ > > > > verifier bug: REG INVARIANTS VIOLATION (false_reg1): range bounds violation u64=[0x0, 0x0] s64=[0x0, 0x0] u32=[0x1, 0x0] s32=[0x0, 0x0] var_off=(0x0, 0x0)(1) > > > > WARNING: CPU: 1 PID: 5833 at kernel/bpf/verifier.c:2688 reg_bounds_sanity_check+0x6e6/0xc20 kernel/bpf/verifier.c:2682 > > > > > > I'm unsure how to handle this one. > > > > > > One example repro is as follows. > > > > > > 0: call bpf_get_netns_cookie > > > 1: if r0 == 0 goto <exit> > > > 2: if r0 & Oxffffffff goto <exit> > > > > > > The issue is on the path where we fall through both jumps. > > > > > > That path is unreachable at runtime: after insn 1, we know r0 != 0, but > > > with the sign extension on the jset, we would only fallthrough insn 2 > > > if r0 == 0. Unfortunately, is_branch_taken() isn't currently able to > > > figure this out, so the verifier walks all branches. As a result, we end > > > up with inconsistent register ranges on this unreachable path: > > > > > > 0: if r0 == 0 goto <exit> > > > r0: u64=[0x1, 0xffffffffffffffff] var_off=(0, 0xffffffffffffffff) > > > 1: if r0 & 0xffffffff goto <exit> > > > r0 before reg_bounds_sync: u64=[0x1, 0xffffffffffffffff] var_off=(0, 0) > > > r0 after reg_bounds_sync: u64=[0x1, 0] var_off=(0, 0) > > > > > > I suspect there isn't anything specific to these two conditions, and > > > anytime we start walking an unreachable path, we may end up with > > > inconsistent register ranges. The number of times syzkaller is currently > > > hitting this (180 in 1.5 days) suggests there are many different ways to > > > reproduce. > > > > > > We could teach is_branch_taken() about this case, but we probably won't > > > be able to cover all cases. We could stop warning on this, but then we > > > may also miss legitimate cases (i.e., invariants violations on reachable > > > paths). We could also teach reg_bounds_sync() to stop refining the > > > bounds before it gets inconsistent, but I'm unsure how useful that'd be. > > > > Hi Paul, > > > > In general, I think that reg_bounds_sync() can be used as a substitute > > for is_branch_taken() -> whenever an impossible range is produced, > > the branch should be deemed impossible at runtime and abandoned. > > If I recall correctly Andrii considered this too risky some time ago, > > so this warning is in place to catch bugs. > > Hi Eduard, > > Yeah, that feels risky enough that I didn't even dare mention it as a > possibility :) > > > > > Which leaves only the option to refine is_branch_taken(). > > > > I think is_branch_taken() modification should not be too complicated. > > For JSET it only checks tnum, but does not take ranges into account. > > Reasoning about ranges is something along the lines: > > - for unsigned range a = b & CONST -> a is in [b_min & CONST, b_max & CONST]; > > - for signed ranged same thing, but consider two unsigned sub-ranges; > > - for non CONST cases, I think same reasoning can apply, but more > > min/max combinations need to be explored. > > - then check if zero is a member or 'a' range. > > > > Wdyt? > > I might be missing something, but I'm not sure that works. For the > unsigned range, if we have b & 0x2 with b in [2; 10], then we'd end up > with a in [2; 2] and would conclude that the jump is never taken. But > b=8 proves us wrong. I see, what is really needed is an 'or' joined mask of all 'b' values. I need to think how that can be obtained (or approximated). > > > > > The number of times syzkaller is currently hitting this (180 in 1.5 > > > days) suggests there are many different ways to reproduce. > > > > It is a bit inconvenient to read syzbot BPF reports at the moment, > > because it us hard to figure out how the program looks like. > > Do you happen to know how complicated would it be to modify syzbot > > output to: > > - produce a comment with BPF program > > - generating reproducer with a flag, allowing to print level 2 > > verifier log > > ? > > I have the same thought sometimes. Right now, I add verifier logs to a > syz or C reproducer to see the program. Producing the BPF program in a > comment would likely be tricky as we'd need to maintain a disassembler > in syzkaller. So, it operates on raw bytes, not on logical instructions? > Adding verifier logs to reproducers that contain bpf(PROG_LOAD) > calls seems easier. Then I guess we'd get that output in the strace > or console logs of syzbot. The log level 2 might be huge, so it shouldn't be enabled by default. But not having to modify the reproducer before investigation would be helpful.