On 18/07/2025 19:17, Segher Boessenkool wrote:
Hi!
On Fri, Jul 18, 2025 at 07:11:12PM +0200, David Brown wrote:
I'm getting the feeling that we've got our wires crossed somewhere.
Signed integer /arithmetic/ overflow is UB in the C standards and in gcc
(unless "-fwrapv" is in effect).
Yup.
Conversion to a signed integer type, when the value cannot be preserved, is
implementation-defined behaviour in the C standards, and in gcc (gcc defines
it to be two's complement wrapping).
And leaving it UB is a valid implementation. An implementation is not
required to specify anything in particular.
The standard is pretty clear about what the different classes of
"behaviour" mean - it's right there in section 3.4 of "Terms,
definitions and symbols" :
"""
3.4.1
1 implementation-defined behavior
unspecified behavior where each implementation documents how the choice
is made
2 Note 1 to entry: J.3 gives an overview over properties of C programs
that lead to implementation-defined behavior.
3 EXAMPLE An example of implementation-defined behavior is the
propagation of the high-order bit when a signed integer
is shifted right.
3.4.3
1 undefined behavior
behavior, upon use of a nonportable or erroneous program construct or of
erroneous data, for which this document imposes no requirements
2 Note 1 to entry: Possible undefined behavior ranges from ignoring the
situation completely with unpredictable results, to behaving during
translation or program execution in a documented manner characteristic
of the environment (with or without the issuance of a diagnostic
message), to terminating a translation or execution (with the issuance
of a diagnostic message).
3 Note 2 to entry: J.2 gives an overview over properties of C programs
that lead to undefined behavior.
4 EXAMPLE An example of undefined behavior is the behavior on integer
overflow.
3.4.4
1 unspecified behavior
behavior, that results from the use of an unspecified value, or other
behavior upon which this document provides two or more possibilities and
imposes no further requirements on which is chosen in any instance
2 Note 1 to entry: J.1 gives an overview over properties of C programs
that lead to unspecified behavior.
3 EXAMPLE An example of unspecified behavior is the order in which the
arguments to a function are evaluated.
"""
When the standard says something is "undefined behaviour", the compiler
can treat it any way it wants - including assuming it never happens for
optimisation purposes, or giving trap instructions, or giving clear,
documented and specific behaviour, or making the behaviour depend on
compiler flags, or having the code email your boss and tell them that
you can't program for peanuts. UB is a great idea that lets compilers
optimise more, avoids asking them to do the impossible (such as ensure
that an arbitrary pointer is "valid" before dereferencing), and
encourages programmers to understand the "garbage in, garbage out"
principle.
When the standard says something is "unspecified behaviour", the
compiler can make a choice of the behaviour amongst several options. It
does not need to document these choices, or be consistent about them.
This lets the compiler re-arrange many things (such as order of
evaluation) for optimisation purposes, but does not change the result.
Padding bits and bytes are, in many circumstances, unspecified values -
the compiler doesn't need to store particular values, but it does have
to ensure they are not "trap values" and requires a certain level of
consistency (i.e., the values can vary between calls to the same code,
but at any given time, the values have to compare equal to themselves.
That is not required for UB.)
"Implementation-defined" behaviour is like "unspecified behaviour",
except that the implementation must document what it does. The compiler
can pick strange choices of behaviours if it likes - it could say that
"x >> y", where "x" is a signed integer type with negative value, works
as though "x" were sign-extended for "int", zero-extended for "long
int", and gives the result of 42 when "x" is a "long long int". That's
allowed - but must be documented and consistent. Similarly, converting
an out-of-range value to a signed integer type is allowed to be done in
different ways in different circumstances, but it must be documented and
it must be consistent. If the documentation says odd numbers use
saturation and even numbers are converted to 0, that's conforming. If
the documentation says it is "undefined", that is /not/ conforming. The
programmer must be able to read the documentation and predict the
effects of the implementation-defined behaviour, and rely on that
behaviour working consistently.
"IB" and "UB" are a world apart. "IB" lets you write efficient
low-level code tuned to a specific compiler, target or system, at the
price of portability. It is perfectly fine - indeed crucial to most C
programs - to rely on "IB". It is never appropriate to rely on the
effects of "UB".
A "cast from unsigned to int" is a conversion, thus it is
implementation-defined.
Yup.
I think somewhere along the line in this thread, the conversions and signed
integer arithmetic overflows have been mixed together.
Oh, people do that all the time!
Segher