On Fri, 29 Aug 2025 at 17:30, Eric Biggers <ebiggers@xxxxxxxxxx> wrote: > > On Fri, Aug 29, 2025 at 03:08:56PM +0200, Honza Fikar wrote: > > On Fri, Aug 29, 2025 at 2:54 PM Eric Biggers <ebiggers@xxxxxxxxxx> wrote: > > > > > Currently, BLAKE2s support is always enabled ('obj-y'), since random.c > > > uses it. Therefore, the arch-optimized BLAKE2s code, which exists for > > > ARM and x86_64, should be always enabled too. > > > > Maybe a stupid question: what about ARM64? The current NEON > > implementation in kernel arch/arm/crypto/blake2s-core.S seems to be just > > for ARM. > > That code is scalar not NEON, and is carefully tuned to make use of the ARM barrel shifter, which does not exist on arm64. > > While the upstream BLAKE2s with NEON is both for ARM and Aarch64 (ARM64): > > > > https://github.com/BLAKE2/BLAKE2/blob/master/neon > > There's no ARM64 optimized BLAKE2s code in the Linux kernel yet. If > it's useful, someone would need to contribute it. > NEON is cumbersome in the kernel so this only makes sense if it is substantially more performant, and I'm skeptical that this is the case, as you pointed out yourself in commit 5172d322d34c30fb926b29aeb5a064e1fd8a5e13 Author: Eric Biggers <ebiggers@xxxxxxxxxx> Date: Wed Dec 23 00:09:59 2020 -0800 crypto: arm/blake2s - add ARM scalar optimized BLAKE2s Add an ARM scalar optimized implementation of BLAKE2s. NEON isn't very useful for BLAKE2s because the BLAKE2s block size is too small for NEON to help. Each NEON instruction would depend on the previous one, resulting in poor performance. Even if NEON code might be slightly faster on some cores, the fact that it is sensitive to micro-architectural details makes it less attractive.