Re: [PATCH v4 bpf-next 1/2] bpf: improve the general precision of tnum_mul

Nandakumar Edamana <nandakumar@xxxxxxxxxxxxxxxx> · Sat, 23 Aug 2025 05:18:50 +0530

On 23/08/25 03:20, Eduard Zingerman wrote:
It is not practical to do such sum tree exploration, of course,
but I stumbled upon the following simple optimization:

   @@ -17,7 +17,7 @@ struct tnum tnum_union(struct tnum a, struct tnum b)
           return TNUM(v & ~mu, mu);
    }

   -struct tnum tnum_mul_new(struct tnum a, struct tnum b)
   +struct tnum __tnum_mul_new(struct tnum a, struct tnum b)
    {
           struct tnum acc = TNUM(0, 0);

   @@ -43,6 +43,14 @@ struct tnum tnum_mul_new(struct tnum a, struct tnum b)
           return acc;
    }

   +struct tnum tnum_mul_new(struct tnum a, struct tnum b)
   +{
   +       struct tnum ab = __tnum_mul_new(a, b);
   +       struct tnum ba = __tnum_mul_new(b, a);
   +
   +       return __builtin_popcountl(ab.mask) < __builtin_popcountl(ba.mask) ? ab : ba;
   +}
   +

For the 8-bit case I get the following stats (using the same [1] as
before):

   Patch as-is                 Patch with above modification
   -----------                 -----------------------------
   Tnums  : 6560
   New win: 30086328    70 %   31282549    73 %
   Old win: 1463809      3 %   907850       2 %
   Same   : 11483463    27 %   10843201    25 %
IIUC, this is same as what my test program [1] does with the 
`--commutative` option, and yes, it improves the precision slightly. 
There is one more possibility: run both the new tnum_mul() and the old 
tnum_mul(), and then pick the best one (still doesn't become optimal, 
based on experiments).
Looks a bit ugly, though.
Wdyt?

Well, I was afraid of the same and that's why it wasn't included in the 
patch.

It is clear that picking the best like this doesn't make it unsound. If 
my understanding is correct, tnum_mul is not something that is called 
very often in usual cases. So it shouldn't affect performance either. 
Then it boils down to the beauty of it. I personally don't think 
`best(a*b, b*a)` is ugly. What about `best(oldprod, newprod)`, where 
oldprod and newprod are each found like this, using the old tnum_mul and 
the new tnum_mul respectively?

[1] https://github.com/nandedamana/improved-tnum-mul

--
Nandakumar