Re: [PATCH v4 bpf-next 1/2] bpf: improve the general precision of tnum_mul

Eduard Zingerman <eddyz87@xxxxxxxxx> · Fri, 22 Aug 2025 14:50:01 -0700

On Fri, 2025-08-22 at 22:38 +0530, Nandakumar Edamana wrote:
> This commit addresses a challenge explained in an open question ("How
> can we incorporate correlation in unknown bits across partial
> products?") left by Harishankar et al. in their paper:
> https://arxiv.org/abs/2105.05398
> 
> When LSB(a) is uncertain, we know for sure that it is either 0 or 1,
> from which we could find two possible partial products and take a
> union. Experiment shows that applying this technique in long
> multiplication improves the precision in a significant number of cases
> (at the cost of losing precision in a relatively lower number of
> cases).
> 
> This commit also removes the value-mask decomposition technique
> employed by Harishankar et al., as its direct incorporation did not
> result in any improvements for the new algorithm.
> 
> Signed-off-by: Nandakumar Edamana <nandakumar@xxxxxxxxxxxxxxxx>
> ---

I was looking a bit into why the new algo loses to current algo in
some cases, and came up with the following explanation of why this
algo does not produce "perfect" answer.

To compute a most precise result one needs to consider all possible
sums that constitute a final answer, e.g. for 0bxx1 * 0b111 possible
sums are:

  111 + 1110 + 11100
  111 + 0000 + 11100
  111 + 1110 + 00000
  111 + 0000 + 00000

tnum_union of these sums is 00xx0xx1, while new algo produces
00xxxxx1. This happens because 'x' bits in intermediate results
"poison" it a bit by accumulating and reducing overall precision.

It is not practical to do such sum tree exploration, of course,
but I stumbled upon the following simple optimization:

  @@ -17,7 +17,7 @@ struct tnum tnum_union(struct tnum a, struct tnum b)
          return TNUM(v & ~mu, mu);
   }

  -struct tnum tnum_mul_new(struct tnum a, struct tnum b)
  +struct tnum __tnum_mul_new(struct tnum a, struct tnum b)
   {
          struct tnum acc = TNUM(0, 0);

  @@ -43,6 +43,14 @@ struct tnum tnum_mul_new(struct tnum a, struct tnum b)
          return acc;
   }

  +struct tnum tnum_mul_new(struct tnum a, struct tnum b)
  +{
  +       struct tnum ab = __tnum_mul_new(a, b);
  +       struct tnum ba = __tnum_mul_new(b, a);
  +
  +       return __builtin_popcountl(ab.mask) < __builtin_popcountl(ba.mask) ? ab : ba;
  +}
  +

For the 8-bit case I get the following stats (using the same [1] as
before):

  Patch as-is                 Patch with above modification
  -----------                 -----------------------------
  Tnums  : 6560
  New win: 30086328    70 %   31282549    73 %
  Old win: 1463809      3 %   907850       2 %
  Same   : 11483463    27 %   10843201    25 %

Looks a bit ugly, though.
Wdyt?

[1] https://github.com/eddyz87/tnum_mul_compare/blob/master/README.md

[...]