Re: Is PRE architecture dependent? aarch64 vs x86_64

Segher Boessenkool <segher@xxxxxxxxxxxxxxxxxxx> · Sun, 13 Jul 2025 14:53:10 -0500

Hi!

On Sun, Jul 13, 2025 at 06:28:08PM +0000, Bradley J Lucier via Gcc-help wrote:
> That may be the wrong question, but let me explain what I’m observing.
> 
> With brew’s gcc 15.1.0 on aarch64-apple-darwin24 (I do realize this is not an official GCC release), I’m seeing the following message:

It is close enough, all the patches they carry are just portability
stuff to make things build in their particular circumstances, right?

> _irregex.c: In function '___H___irregex':
> /Users/lucier/programs/gambit/gambit/include/gambit.h:6501:1: warning: PRE disabled: 38820 basic blocks and 164130 registers; increase '--param max-gcse-memory' above 777916 [-Wdisabled-optimization]
>  6501 | }
>       | ^
> /Users/lucier/programs/gambit/gambit/include/gambit.h:2390:19: note: in definition of macro '___SM'
>  2390 | #define ___SM(s,m)s
>       |                   ^
> /Users/lucier/programs/gambit/gambit/include/gambit.h:6462:28: note: in expansion of macro '___END_COD'
>  6462 | #define ___END_M_COD ___SM(___END_COD,___NOTHING)
>       |                            ^~~~~~~~~~
> _irregex.c:155937:1: note: in expansion of macro '___END_M_COD'
> 155937 | ___END_M_COD
>        | ^~~~~~~~~~~~
> 
> This is the largest of a number of routines I’m compiling.  max-gcse-memory has already been set to 400000 to silence similar warnings for a few smaller (but still quite large) routines.

I thought that was pretty small still, but the param is counted in
kilobytes, not bytes.  400MB is not so small.

> My very limited understanding of PRE is that its behavior should not be architecture dependent, so I’m a bit surprised at these warnings.

GCC's GCSE isn't super great, and that is mostly because it tries to do
too many things at once, in one place.  It does LCM for example, which
might work better as a fully separate optimisation pass.  Maybe some
shared data structures, or some shared library code.

But more separate :-)

> Any advice?

Of course it is somewhat dependent on the architecture.  The number of
available general registers, for example!  And aarch is way more
orthogonal than x86 in pretty much every way imaginable, so there is a
lot more freedom to the compiler always, so if it wants to exhaustively
search the possibilities (or even just try out many options) it has a
lot more possibilities, so more work to do.

The algorithm in and of itself is the same for any arch.

It sounds like the code is huge repetitive code, with many big
expressions in it, machine-generated code probably (a lot of macros for
example), so it does make sense to increase some params here.

Have you tried that out?  How are the results?  Or, do reduced params,
or even completely disabling the optimisation passes work just as well?

Segher