Re: [PATCH v2 bpf-next] bpf: improve the general precision of tnum_mul

Harishankar Vishwanathan <harishankar.vishwanathan@xxxxxxxxx> · Wed, 20 Aug 2025 02:15:10 -0400

On Fri, 2025-08-15 at 19:35 +0530, Nandakumar Edamana wrote:

[...]

> This commit addresses a challenge explained in an open question ("How
> can we incorporate correlation in unknown bits across partial
> products?") left by Harishankar et al. in their paper:
> https://arxiv.org/abs/2105.05398

This is nice work on improving the precision of the tnum_mul algorithm
Nandakumar. I had a few questions and comments (below).

> When LSB(a) is uncertain, we know for sure that it is either 0 or 1,
> from which we could find two possible partial products and take a
> union. Experiment shows that applying this technique in long
> multiplication improves the precision in a significant number of cases
> (at the cost of losing precision in a relatively lower number of
> cases).

If I understand the idea correctly, when the multiplier's (i.e. tnum a) bit is
unknown, it can either be 0 or 1. If it is 0, then we add nothing to
accumulator, i.e. TNUM(0, 0). If it is 1, we can add b to the accumulator
(appropriately shifted). The main idea is to take the union of these two
possible partial products, and add that to the accumulator. If so, could we also
do the following?

acc = tnum_add(acc, tnum_union(TNUM(0, 0), b));

[...]

> +		else if (a.mask & 1) {
> +			/* acc += tnum_union(acc_0, acc_1), where acc_0 and
> +			 * acc_1 are partial accumulators for cases
> +			 * LSB(a) = certain 0 and LSB(a) = certain 1.
> +			 * acc_0 = acc + 0 * b = acc.
> +			 * acc_1 = acc + 1 * b = tnum_add(acc, b).
> +			 */
> +
> +			acc = tnum_union(acc, tnum_add(acc, b));

The comments in your algorithm seem different from what the code implements: I
believe the comment should read:
/* acc = tnum_union(acc_0, acc_1), where acc_0 and ... */

[...]

> +struct tnum tnum_union(struct tnum a, struct tnum b)
> +{
> +	u64 v = a.value & b.value;
> +	u64 mu = (a.value ^ b.value) | a.mask | b.mask;
> +
> +	return TNUM(v & ~mu, mu);
> +}

The tnum_union() algorithm itself seems correct. I checked this in z3 [1].

[1] https://github.com/bpfverif/tnums-cgo22/blob/05a51a1af6cb72f9694e5034c263ca442b33b976/verification/tnum.py#L801

I also checked this algorithm for the precision gains myself [2]. While the
numbers were not exactly the same as the ones you shared, the algorithm does
seem to be more precise (echoed by Eduard's tests as well). In the below
"our_mul" is the existing algorithm, and "new_mul" is your new algorithm. There
are two outputs, one for the exhaustive enumeration at 8 bitwidth, and another
for 10M randomly sampled 64-bit tnums.

[2] https://github.com/bpfverif/tnums-cgo22/tree/main/precision-new-mul

$ ./precision 8 64 10000000
number of input tnum pairs where our_mul had better precision: 1420899
number of input tnum pairs where new_mul had better precision: 30063672
number of input tnum pairs where output was the same: 11383710
number of input tnum pairs where output was incomparable: 178440

sampled at bitwidth 64
number of input tnum pairs where our_mul had better precision: 66441
number of input tnum pairs where new_mul had better precision: 241588
number of input tnum pairs where output was the same: 9687187
number of input tnum pairs where output was incomparable: 4784

Finally, what is the guarantee that this algorithm is sound? It will be useful
to have a proof to ensure that we are maintaining correctness.

Best,
Harishankar Vishwanathan