Re: Is PRE architecture dependent? aarch64 vs x86_64

David Brown via Gcc-help <gcc-help@xxxxxxxxxxx> · Sat, 19 Jul 2025 12:57:46 +0200

On 18/07/2025 19:17, Segher Boessenkool wrote:
Hi!

On Fri, Jul 18, 2025 at 07:11:12PM +0200, David Brown wrote:
I'm getting the feeling that we've got our wires crossed somewhere.

Signed integer /arithmetic/ overflow is UB in the C standards and in gcc
(unless "-fwrapv" is in effect).

Yup.

Conversion to a signed integer type, when the value cannot be preserved, is
implementation-defined behaviour in the C standards, and in gcc (gcc defines
it to be two's complement wrapping).

And leaving it UB is a valid implementation.  An implementation is not
required to specify anything in particular.

The standard is pretty clear about what the different classes of 
"behaviour" mean - it's right there in section 3.4 of "Terms, 
definitions and symbols" :

"""
3.4.1
1 implementation-defined behavior

unspecified behavior where each implementation documents how the choice 
is made

2 Note 1 to entry: J.3 gives an overview over properties of C programs 
that lead to implementation-defined behavior.

3 EXAMPLE An example of implementation-defined behavior is the 
propagation of the high-order bit when a signed integer
is shifted right.

3.4.3
1 undefined behavior

behavior, upon use of a nonportable or erroneous program construct or of 
erroneous data, for which this document imposes no requirements

2 Note 1 to entry: Possible undefined behavior ranges from ignoring the 
situation completely with unpredictable results, to behaving during 
translation or program execution in a documented manner characteristic 
of the environment (with or without the issuance of a diagnostic 
message), to terminating a translation or execution (with the issuance 
of a diagnostic message).

3 Note 2 to entry: J.2 gives an overview over properties of C programs 
that lead to undefined behavior.

4 EXAMPLE An example of undefined behavior is the behavior on integer 
overflow.

3.4.4
1 unspecified behavior

behavior, that results from the use of an unspecified value, or other 
behavior upon which this document provides two or more possibilities and 
imposes no further requirements on which is chosen in any instance

2 Note 1 to entry: J.1 gives an overview over properties of C programs 
that lead to unspecified behavior.

3 EXAMPLE An example of unspecified behavior is the order in which the 
arguments to a function are evaluated.

"""

When the standard says something is "undefined behaviour", the compiler 
can treat it any way it wants - including assuming it never happens for 
optimisation purposes, or giving trap instructions, or giving clear, 
documented and specific behaviour, or making the behaviour depend on 
compiler flags, or having the code email your boss and tell them that 
you can't program for peanuts.  UB is a great idea that lets compilers 
optimise more, avoids asking them to do the impossible (such as ensure 
that an arbitrary pointer is "valid" before dereferencing), and 
encourages programmers to understand the "garbage in, garbage out" 
principle.

When the standard says something is "unspecified behaviour", the 
compiler can make a choice of the behaviour amongst several options.  It 
does not need to document these choices, or be consistent about them. 
This lets the compiler re-arrange many things (such as order of 
evaluation) for optimisation purposes, but does not change the result. 
Padding bits and bytes are, in many circumstances, unspecified values - 
the compiler doesn't need to store particular values, but it does have 
to ensure they are not "trap values" and requires a certain level of 
consistency (i.e., the values can vary between calls to the same code, 
but at any given time, the values have to compare equal to themselves. 
That is not required for UB.)

"Implementation-defined" behaviour is like "unspecified behaviour", 
except that the implementation must document what it does.  The compiler 
can pick strange choices of behaviours if it likes - it could say that 
"x >> y", where "x" is a signed integer type with negative value, works 
as though "x" were sign-extended for "int", zero-extended for "long 
int", and gives the result of 42 when "x" is a "long long int".  That's 
allowed - but must be documented and consistent.  Similarly, converting 
an out-of-range value to a signed integer type is allowed to be done in 
different ways in different circumstances, but it must be documented and 
it must be consistent.  If the documentation says odd numbers use 
saturation and even numbers are converted to 0, that's conforming.  If 
the documentation says it is "undefined", that is /not/ conforming.  The 
programmer must be able to read the documentation and predict the 
effects of the implementation-defined behaviour, and rely on that 
behaviour working consistently.

"IB" and "UB" are a world apart.  "IB" lets you write efficient 
low-level code tuned to a specific compiler, target or system, at the 
price of portability.  It is perfectly fine - indeed crucial to most C 
programs - to rely on "IB".  It is never appropriate to rely on the 
effects of "UB".

A "cast from unsigned to int" is a conversion, thus it is
implementation-defined.

Yup.

I think somewhere along the line in this thread, the conversions and signed
integer arithmetic overflows have been mixed together.

Oh, people do that all the time!

Segher