Re: Minimum requirements for a custom libc

Segher Boessenkool <segher@xxxxxxxxxxxxxxxxxxx> · Mon, 2 Jun 2025 16:28:58 -0500

On Mon, Jun 02, 2025 at 05:16:42PM +0100, Richard Earnshaw (lists) via Gcc-help wrote:
> On 02/06/2025 15:49, Jonathan Wakely wrote:
> > On Mon, 2 Jun 2025 at 14:24, Richard Earnshaw (lists) wrote:
> >> $ /work/rearnsha/scratch/gnu/gcc/aarch64/master/gcc/xgcc -B /work/rearnsha/scratch/gnu/gcc/aarch64/master/gcc/ -I ~/gnusrc/newlib/master/newlib/libc/include/ -O2 -march=armv8-a+mops -o - -S /tmp/mem.c
> >>         .arch armv8-a+mops
> >> f:
> >>         cpyfp   [x0]!, [x1]!, x2!
> >>         cpyfm   [x0]!, [x1]!, x2!
> >>         cpyfe   [x0]!, [x1]!, x2!
> >>         ret
> > 
> > Ah, thanks for the correction!
> > 
> > For x86_64 both gcc and clang emit a call to memcpy:
> > 
> > https://godbolt.org/z/hGvbM4df8
> 
> AArch64 would do as well if you don't have the MOPS extension.  As I said, the limit, if any, is an target implementation choice; it's generally driven by the amount of code bloat that picking the best strategy would require.

Same on Power.  On most architectures it is possible to do faster memcpy
routines if you can spend as much code as you want on it, but like on
Arm with MOPS you just need some small insns, and on e.g. more embedded
targets you cannot go faster than loops that do a word per cycle in any
way, and you can write pretty good code for that (you can then implement
the libc memcpy() as just a __builtin_memcpy(), great fun!)

Segher