Hi! On Wed, Jun 04, 2025 at 09:10:02AM +0100, Tim Froggatt wrote: > On 03/06/2025 21:57, Segher Boessenkool wrote: > >It might pay off to write that one directly in assembler > >code, don't try to manipulate the compiler into ending up with the code > >you want, just write it! > > Using assembler is certainly one thing I am considering, but it will not be > possible for everything as this is 100,000s of lines of code. I suggested focussing on the one or few functions that dominate performance, for this reason :-) > But at the moment, I am not yet interested in solving any specific problem - > that will be a job for later. Before I'm ready for that, first I want to > understand - what do these optimisation options actually do? As the documentation says: Align the start of functions to the next power-of-two greater than or equal to N, skipping up to M-1 bytes. This ensures that at least the first M bytes of the function can be fetched by the CPU without crossing an N-byte alignment boundary. This is an optimization of code performance and alignment is ignored for functions considered cold. If alignment is required for all functions, use '-fmin-function-alignment'. > So my two questions are: > > 1) Why is m=1 different from m=2,3,4? Because I thought ARM32 instructions > were always 4-byte aligned. Shouldn't they produce the same code? You will need to look at the generated code to see the actual differences. Seeing that option A results in a bit faster code than option B does not show any "why". > 2) And why are m=1 and m=2,3,4 different to -fno-align-functions? Because I > would think m=1,2,3,4 would do no alignment. Show us the generated code. Or investigate that yourself. But without seeing it we cannot say much useful about it. Segher