On 23/08/25 03:20, Eduard Zingerman wrote:
It is not practical to do such sum tree exploration, of course,
but I stumbled upon the following simple optimization:
@@ -17,7 +17,7 @@ struct tnum tnum_union(struct tnum a, struct tnum b)
return TNUM(v & ~mu, mu);
}
-struct tnum tnum_mul_new(struct tnum a, struct tnum b)
+struct tnum __tnum_mul_new(struct tnum a, struct tnum b)
{
struct tnum acc = TNUM(0, 0);
@@ -43,6 +43,14 @@ struct tnum tnum_mul_new(struct tnum a, struct tnum b)
return acc;
}
+struct tnum tnum_mul_new(struct tnum a, struct tnum b)
+{
+ struct tnum ab = __tnum_mul_new(a, b);
+ struct tnum ba = __tnum_mul_new(b, a);
+
+ return __builtin_popcountl(ab.mask) < __builtin_popcountl(ba.mask) ? ab : ba;
+}
+
For the 8-bit case I get the following stats (using the same [1] as
before):
Patch as-is Patch with above modification
----------- -----------------------------
Tnums : 6560
New win: 30086328 70 % 31282549 73 %
Old win: 1463809 3 % 907850 2 %
Same : 11483463 27 % 10843201 25 %
IIUC, this is same as what my test program [1] does with the
`--commutative` option, and yes, it improves the precision slightly.
There is one more possibility: run both the new tnum_mul() and the old
tnum_mul(), and then pick the best one (still doesn't become optimal,
based on experiments).
Looks a bit ugly, though.
Wdyt?
Well, I was afraid of the same and that's why it wasn't included in the
patch.
It is clear that picking the best like this doesn't make it unsound. If
my understanding is correct, tnum_mul is not something that is called
very often in usual cases. So it shouldn't affect performance either.
Then it boils down to the beauty of it. I personally don't think
`best(a*b, b*a)` is ugly. What about `best(oldprod, newprod)`, where
oldprod and newprod are each found like this, using the old tnum_mul and
the new tnum_mul respectively?
[1] https://github.com/nandedamana/improved-tnum-mul
--
Nandakumar