Re: [PATCH v4 bpf-next 1/2] bpf: improve the general precision of tnum_mul

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 23/08/25 03:20, Eduard Zingerman wrote:
It is not practical to do such sum tree exploration, of course,
but I stumbled upon the following simple optimization:

   @@ -17,7 +17,7 @@ struct tnum tnum_union(struct tnum a, struct tnum b)
           return TNUM(v & ~mu, mu);
    }
-struct tnum tnum_mul_new(struct tnum a, struct tnum b)
   +struct tnum __tnum_mul_new(struct tnum a, struct tnum b)
    {
           struct tnum acc = TNUM(0, 0);
@@ -43,6 +43,14 @@ struct tnum tnum_mul_new(struct tnum a, struct tnum b)
           return acc;
    }
+struct tnum tnum_mul_new(struct tnum a, struct tnum b)
   +{
   +       struct tnum ab = __tnum_mul_new(a, b);
   +       struct tnum ba = __tnum_mul_new(b, a);
   +
   +       return __builtin_popcountl(ab.mask) < __builtin_popcountl(ba.mask) ? ab : ba;
   +}
   +

For the 8-bit case I get the following stats (using the same [1] as
before):

   Patch as-is                 Patch with above modification
   -----------                 -----------------------------
   Tnums  : 6560
   New win: 30086328    70 %   31282549    73 %
   Old win: 1463809      3 %   907850       2 %
   Same   : 11483463    27 %   10843201    25 %
IIUC, this is same as what my test program [1] does with the `--commutative` option, and yes, it improves the precision slightly. There is one more possibility: run both the new tnum_mul() and the old tnum_mul(), and then pick the best one (still doesn't become optimal, based on experiments).
Looks a bit ugly, though.
Wdyt?

Well, I was afraid of the same and that's why it wasn't included in the patch.

It is clear that picking the best like this doesn't make it unsound. If my understanding is correct, tnum_mul is not something that is called very often in usual cases. So it shouldn't affect performance either. Then it boils down to the beauty of it. I personally don't think `best(a*b, b*a)` is ugly. What about `best(oldprod, newprod)`, where oldprod and newprod are each found like this, using the old tnum_mul and the new tnum_mul respectively?

[1] https://github.com/nandedamana/improved-tnum-mul

--
Nandakumar





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux