On Fri, 2025-08-22 at 22:38 +0530, Nandakumar Edamana wrote: > This commit addresses a challenge explained in an open question ("How > can we incorporate correlation in unknown bits across partial > products?") left by Harishankar et al. in their paper: > https://arxiv.org/abs/2105.05398 > > When LSB(a) is uncertain, we know for sure that it is either 0 or 1, > from which we could find two possible partial products and take a > union. Experiment shows that applying this technique in long > multiplication improves the precision in a significant number of cases > (at the cost of losing precision in a relatively lower number of > cases). > > This commit also removes the value-mask decomposition technique > employed by Harishankar et al., as its direct incorporation did not > result in any improvements for the new algorithm. > > Signed-off-by: Nandakumar Edamana <nandakumar@xxxxxxxxxxxxxxxx> > --- I was looking a bit into why the new algo loses to current algo in some cases, and came up with the following explanation of why this algo does not produce "perfect" answer. To compute a most precise result one needs to consider all possible sums that constitute a final answer, e.g. for 0bxx1 * 0b111 possible sums are: 111 + 1110 + 11100 111 + 0000 + 11100 111 + 1110 + 00000 111 + 0000 + 00000 tnum_union of these sums is 00xx0xx1, while new algo produces 00xxxxx1. This happens because 'x' bits in intermediate results "poison" it a bit by accumulating and reducing overall precision. It is not practical to do such sum tree exploration, of course, but I stumbled upon the following simple optimization: @@ -17,7 +17,7 @@ struct tnum tnum_union(struct tnum a, struct tnum b) return TNUM(v & ~mu, mu); } -struct tnum tnum_mul_new(struct tnum a, struct tnum b) +struct tnum __tnum_mul_new(struct tnum a, struct tnum b) { struct tnum acc = TNUM(0, 0); @@ -43,6 +43,14 @@ struct tnum tnum_mul_new(struct tnum a, struct tnum b) return acc; } +struct tnum tnum_mul_new(struct tnum a, struct tnum b) +{ + struct tnum ab = __tnum_mul_new(a, b); + struct tnum ba = __tnum_mul_new(b, a); + + return __builtin_popcountl(ab.mask) < __builtin_popcountl(ba.mask) ? ab : ba; +} + For the 8-bit case I get the following stats (using the same [1] as before): Patch as-is Patch with above modification ----------- ----------------------------- Tnums : 6560 New win: 30086328 70 % 31282549 73 % Old win: 1463809 3 % 907850 2 % Same : 11483463 27 % 10843201 25 % Looks a bit ugly, though. Wdyt? [1] https://github.com/eddyz87/tnum_mul_compare/blob/master/README.md [...]