Re: Is PRE architecture dependent? aarch64 vs x86_64

David Brown via Gcc-help <gcc-help@xxxxxxxxxxx> · Sat, 19 Jul 2025 11:18:33 +0200

On 18/07/2025 19:11, Florian Weimer via Gcc-help wrote:
* David Brown:

Are you able to give an example of the C code for which the
optimisation above applies, and values for which the result is
affected?  (When thinking about overflows, I always like to use 16-bit
int because the numbers are smaller and easier to work with.)

I think this code can turn something like

   (X * 3 - Y * 5) * 7

to

   X * 21 - Y * 35

Plug in X = 715827882 and Y = 429496729, then the original operation
does not have an overflow (the difference is 1), but the transformed
expression overflows on both 715827882 * 21 and 429496729 * 35.  So
this transformation is only correct for -fwrapv because it introduces
overflow cases that are not present in the original expression.

Thanks,
Florian

It is perfectly correct to make this transformation while "-fwrapv" is 
not in effect, as long as the intermediary calculations are done with 
wrapping instructions.

The user can write the original expression in a "-fno-wrapv" context, 
happy to promise that they will never put in invalid X and Y values that 
will lead "X * 3", "Y * 5", "(X * 3 - Y * 5)" or "(X * 3 - Y * 5) * 7" 
to overflow and happy to accept that the compiler will launch nasal 
daemons if they break that promise.

The compiler can transform it into code that does "X * 21 - Y * 35" and 
gives correct answers for valid inputs (no one cares what it does for 
invalid inputs) - as long as it uses two's complement wrapping 
instructions for the two multiplies.

What the compiler cannot do is transform it into "X * 21 - Y * 35" with 
trapping overflow multiplies or other instructions that have different 
behaviour.

I am guessing that this transformation is being done on internal 
representation of the code - but I don't know the details of how these 
work in gcc.  (One day, I'd love to have the time to learn.)  It might 
well be that there is no way in that representation to request that 
these multiplies be two's complement wrapping when the main context is 
UB overflow.  If that's the case, then I can understand how that 
optimisation is not applied - but that would be a limitation of the 
internal representation rather than a fundamental issue with the 
optimisation.

(I realise that gcc is not perfect - just very, very good.  There are 
things it can't handle, and optimisations that would be too much effort 
to implement for their limited benefit, or take too long at compile time 
in relation to their run-time gains.)

David