Hi! On Tue, Jun 03, 2025 at 09:16:46PM +0100, Tim Froggatt wrote: > To give you some context, I'm compiling code that forwards packets on an > embedded network device. The code accesses the hardware directly, there is > no operating system. The speed of network traffic is limited by the CPU and > I'm experimenting with different GCC optimisation options, and seeing how > fast I can get the same code to forward network traffic. > > I don't have my actual results to hand now, but the following example shows > the pattern that I see... (in real life, the differences are relatively not > so drastic, but the specific values aren't important) > > -fno-align-functions 60 Mbps > > -falign-functions=64:1 40 Mbps > -falign-functions=64:2 70 Mbps > -falign-functions=64:3 70 Mbps > -falign-functions=64:4 70 Mbps Apparently you get a big difference for one of the functions, more than for others. It might pay off to write that one directly in assembler code, don't try to manipulate the compiler into ending up with the code you want, just write it! > -falign-functions=64:5 90 Mbps > -falign-functions=64:6 90 Mbps > -falign-functions=64:7 90 Mbps > -falign-functions=64:8 90 Mbps > > -falign-functions=64:9 100 Mbps > -falign-functions=64:10 100 Mbps > -falign-functions=64:11 100 Mbps > -falign-functions=64:12 100 Mbps > > -falign-functions=64:13 50 Mbps > -falign-functions=64:14 50 Mbps > -falign-functions=64:15 50 Mbps > -falign-functions=64:16 50 Mbps > > -falign-functions=64:17 40 Mbps > -falign-functions=64:18 40 Mbps > -falign-functions=64:19 40 Mbps > -falign-functions=64:20 40 Mbps > > etc... None of this shows the generated code, so based on this you cannot really say anything about the generated code. You just show a very derivative number, "how fast is this (for some very specific testcase)". And the numbers are very rounded as well (or sampled without reasonable resolution?) Segher