Re: Is PRE architecture dependent? aarch64 vs x86_64

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 18/07/2025 19:11, Florian Weimer via Gcc-help wrote:
* David Brown:

Are you able to give an example of the C code for which the
optimisation above applies, and values for which the result is
affected?  (When thinking about overflows, I always like to use 16-bit
int because the numbers are smaller and easier to work with.)

I think this code can turn something like

   (X * 3 - Y * 5) * 7

to

   X * 21 - Y * 35

Plug in X = 715827882 and Y = 429496729, then the original operation
does not have an overflow (the difference is 1), but the transformed
expression overflows on both 715827882 * 21 and 429496729 * 35.  So
this transformation is only correct for -fwrapv because it introduces
overflow cases that are not present in the original expression.

Thanks,
Florian



It is perfectly correct to make this transformation while "-fwrapv" is not in effect, as long as the intermediary calculations are done with wrapping instructions.


The user can write the original expression in a "-fno-wrapv" context, happy to promise that they will never put in invalid X and Y values that will lead "X * 3", "Y * 5", "(X * 3 - Y * 5)" or "(X * 3 - Y * 5) * 7" to overflow and happy to accept that the compiler will launch nasal daemons if they break that promise.

The compiler can transform it into code that does "X * 21 - Y * 35" and gives correct answers for valid inputs (no one cares what it does for invalid inputs) - as long as it uses two's complement wrapping instructions for the two multiplies.

What the compiler cannot do is transform it into "X * 21 - Y * 35" with trapping overflow multiplies or other instructions that have different behaviour.

I am guessing that this transformation is being done on internal representation of the code - but I don't know the details of how these work in gcc. (One day, I'd love to have the time to learn.) It might well be that there is no way in that representation to request that these multiplies be two's complement wrapping when the main context is UB overflow. If that's the case, then I can understand how that optimisation is not applied - but that would be a limitation of the internal representation rather than a fundamental issue with the optimisation.

(I realise that gcc is not perfect - just very, very good. There are things it can't handle, and optimisations that would be too much effort to implement for their limited benefit, or take too long at compile time in relation to their run-time gains.)

David





[Index of Archives]     [Linux C Programming]     [Linux Kernel]     [eCos]     [Fedora Development]     [Fedora Announce]     [Autoconf]     [The DWARVES Debugging Tools]     [Yosemite Campsites]     [Yosemite News]     [Linux GCC]

  Powered by Linux