Re: Tips to compile very very giant code

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 2025-02-12 at 10:09 +0100, Basile Starynkevitch wrote:
> On Wed, 2025-02-12 at 09:25 +0100, Florian Weimer via Gcc-help wrote:
> > * Bento Borges Schirmer:
> > 
> > > [3]
> > > https://github.com/bottle2/swf2c/blob/88f9ccb7912d55002e87f1efb11f21720d97e4ec/tests/thousands-of-functions.c
> > 
> > You should turn L and B into proper functions instead of macros, then
> > compilation time will decrease significantly.  If compilation time is
> > still too high, consider adopting a table-based approach.
> 
> 
> And your compiled C or C++ code should preferably be made of translation units
> (C or C++ files) not bigger than about ten thousands lines each.
> 
> Observe that the C++ code of GCC don't have any source file bigger than 60KLOC
> (the biggest one being ./gcc/cp/parser.cc and ./libstdc++-
> v3/testsuite/20_util/to_chars/double.cc ...) and that generated C++ code (e.g.
> _GccTrunk/gcc/insn-dfatab.c ...) has at most 210KLOC.
> 
> 
> See also https://arxiv.org/abs/1109.0779 and (if you want to generate C code
> which is huge to benchmark compile time) 
> https://github.com/bstarynk/misc-basile/blob/master/manydl.c
> 
> My advice is to refactor your human written C or C++ files to have no more
> than
> ten thousands lines each. (with C++ templates the compilation time can still
> take dozen of minutes in pathological cases).


You mention also that:


b) one 307725-line-long function [4] takes 2 hours on `clang -Oz` and
10 minutes on `clang -O0`


But such a huge function cannot be written by a human being: it is not
understandable and not readable.

So that 307725-line-long function is by necessity generated by some software
tool.

In the (obsolete GCC MELT) plugin described in https://arxiv.org/abs/1109.0779 I
did encounter the same issue.

And I had to work on the code generator to split huge generated code functions
into more manageable pieces.

IIRC my C++ code generator (the GCC MELT plugin) did split (at C++ generation
time) huge blocks into separate static functions. And this took me two weeks of
work (or maybe three). It was almost 15 years ago, so I forgot the details.

If your C++ code generator is not yours you should report that as a bug to its
supplier.

https://stackoverflow.com/a/36474352/841108 is indeed relevant.

Notice also that huge C++ functions are obviously triggering the compiler. In my
past informal experience on compilation time with gcc -O2 of randomly generated
functions (by https://github.com/bstarynk/misc-basile/blob/master/manydl.c ...)
the compilation time seems quadratic in the number of lines of the source of a
single C or C++ function (more exactly quadratic in the number of GIMPLE
statements).

And register allocators cannot behave well on such huge functions. Actually any
large function (more than a few thousand GIMPLE statements or lines) is very
likely to be compiled to "slow" code (even by gcc -O3 or by clang -O3) 

So my advice is really to work on the generating tool (the software emitting C
or C++ code) to make it generate functions not bigger than about ten thousand
lines (or GIMPLE statements) each.

This even apply to "obvious" C++ generators like
https://www.fltk.org/doc-2.0/html/fluid.html

FWIW, Jacques Pitrat (the French symbolic AI research pionner) also generated C
code (see his last book Artificial Beings - The conscience of a conscious
machine ISTE, Wiley, Mars 2009. ISBN 978-1848211018) and observed the same
issue. His generated code is on the web on
https://github.com/bstarynk/caia-pitrat and he had to spend time improving his
code generator (the same system itself, an expert system in 1990 terminology) to
lower the size of every generated routine to a reasonable number of lines.
His blog is still online but sadly he is dead
http://bootstrappingartificialintelligence.fr/WordPress3/

If you want to generate a few huge functions and don't care about performance
(e.g. because they are initialization functions running once in a Linux process)
you might use a simpler compiler (maybe nwcc or tinycc) to compile them, or
generate naive machine code with libraries like asmjit.com or GNU lightning see
https://www.gnu.org/software/lightning/

Alternatively emit also GCC specific #pragma-s to force that single huge C
function to be compiled with the equivalent of -O0

I do recommend emitting its C or C++ code in a different translation unit, which
would be compiled without optimization e.g. -O0. And you could use "worse" C
compiler like tinycc (see http://download.savannah.gnu.org/releases/tinycc/ ...)
or like nwcc (see https://nwcc.sourceforge.net/ ...)

Same advices apply for using libgccjit on  https://gcc.gnu.org/onlinedocs/jit/

My summary: dont expect the GCC (or Clang) compiler to compile "well" (e.g.
quickly and with effective optimization) a generated C or C++ (or Fortran)
function with a hundred thousand statements.


Regards.

-- 
Basile STARYNKEVITCH           <basile@xxxxxxxxxxxxxxxxx>
8 rue de la Faïencerie
92340 Bourg-la-Reine,          France
http://starynkevitch.net/Basile & https://github.com/bstarynk 




[Index of Archives]     [Linux C Programming]     [Linux Kernel]     [eCos]     [Fedora Development]     [Fedora Announce]     [Autoconf]     [The DWARVES Debugging Tools]     [Yosemite Campsites]     [Yosemite News]     [Linux GCC]

  Powered by Linux