Re: Is PRE architecture dependent? aarch64 vs x86_64

David Brown via Gcc-help <gcc-help@xxxxxxxxxxx> · Fri, 18 Jul 2025 18:39:41 +0200

On 18/07/2025 18:04, Florian Weimer via Gcc-help wrote:
* David Brown via Gcc-help:

On 18/07/2025 16:32, Florian Weimer via Gcc-help wrote:
* Segher Boessenkool:

On Mon, Jul 14, 2025 at 03:03:46PM -0700, Florian Weimer wrote:
* Segher Boessenkool:

-fwrapv is a great way to get slower code, too.  Is there something in
your code that does not work without this reality-distorting flag?

It really depends on the code.  In many cases, -fwrapv enables
additional optimizations.  For example, it's much easier to use (in C
code) the implicit sign bit many CPUs compute for free.

"-fwrapv" in itself does not enable any additional optimisations as
far as I know.  In particular, any time you don't have that flag
activated, and the compiler could generate more efficient code by
using wrapping behaviour for two's complement arithmetic, then it is
free to do so - since signed overflow is undefined behaviour in C, the
compiler can treat it as defined to wrap if that's what suits.

While this is true in principle, it's not how -fwrapv (or undefined signed
overflow) is implemented in GCC.  When writing optimizations, you have
to be careful not to introduce signed overflow that was not present in
the original code because there aren't separate tree operations for
wrapping and overflowing operations.

There aren't many examples like this in the code base today, perhaps
because -fwrapv is not the default and any such optimization would not
get used much.  But here's one:

       /* The last case is if we are a multiply.  In that case, we can
	 apply the distributive law to commute the multiply and addition
	 if the multiplication of the constants doesn't overflow
	 and overflow is defined.  With undefined overflow
	 op0 * c might overflow, while (op0 + orig_op1) * c doesn't.
	 But fold_plusminus_mult_expr would factor back any power-of-two
	 value so do not distribute in the first place in this case.  */
       if (code == MULT_EXPR
	  && TYPE_OVERFLOW_WRAPS (ctype)
	  && !(tree_fits_shwi_p (c) && pow2p_hwi (absu_hwi (tree_to_shwi (c)))))
	return fold_build2 (tcode, ctype,
			    fold_build2 (code, ctype,
					 fold_convert (ctype, op0),
					 fold_convert (ctype, c)),
			    op1);

I am not at all well-versed in the internals of GCC, so I don't know 
what is going on in that code.  But I am not aware of any situation 
where using wrapping instructions could introduce new overflows that 
made it through to the final answer.  In any combination of additions 
and multiplications, it doesn't matter when you (logically) apply the 
modulo operation to limit your range to your bit size - the result is 
the same.

But what /could/ happen is that you have extra intermediary overflows. 
If you have "-ftrapv" in action, then "a * (x - y)" and "(a * x) - (a * 
y)" can have different behaviour if there are overflows in the 
intermediary parts.

However, when "-ftrapv" is not in effect, I cannot see how "-fwrapv" 
allows any extra optimisations.

Are you able to give an example of the C code for which the optimisation 
above applies, and values for which the result is affected?  (When 
thinking about overflows, I always like to use 16-bit int because the 
numbers are smaller and easier to work with.)

It would be just another extension, and one that many compilers already
enable by default.  Even GCC makes casting from unsigned to int defined
in all cases because doing that in a standard-conforming way is way too
painful.

I may be misunderstanding what you wrote here.  In cases where
something is undefined in the C standards, a compiler can define the
behaviour if it wants - that does not break standards conformation in
any way.

Converting from an unsigned integer type to a signed integer type is
fully defined in the C standards if the value can be represented and
does not change.  If not (because it is too big), the result is
implementation-defined (or an implementation-defined trap).  gcc does
this by two's complement wrapping and modulo (basically, it generally
does nothing as all its targets are two's complement) - that is
entirely standard-conforming.

It's standard-conforming, but GCC forgoes a lot of optimization
opportunities as a result.  Like not doing -fwrapv by default, this
breaks quite a bit of code, of course.  It's also a missed opportunity
for telling more programmers that they can't write correct C code.

Conversion to signed integer types is implementation-defined behaviour 
in the C standards, not undefined behaviour.  That means the compiler 
must pick a specific tactic which is documented (in section 4.5 of the 
gcc manual) and applied consistently.  It is not undefined behaviour - 
code that relies on two's complement conversion of unsigned types to 
signed types is not incorrect code, merely non-portable code.  (In 
practice, of course, it is portable, as all real-world compilers use the 
same tactic on two's complement targets.)

If conversion to signed integer types had had some undefined behaviour, 
in the manner of signed integer arithmetic overflow, it would be a 
different matter - then picking two's complement conversion would have 
reduced optimisation opportunities and encouraged incorrect code.

David