Re: [GIT PULL] Crypto Update for 6.15

Eric Biggers <ebiggers@xxxxxxxxxx> · Sat, 29 Mar 2025 11:06:31 -0700

On Sat, Mar 29, 2025 at 10:40:23AM -0700, Linus Torvalds wrote:
> On Tue, 25 Mar 2025 at 08:25, Eric Biggers <ebiggers@xxxxxxxxxx> wrote:
> >
> > Each hash request can also contain an entire scatterlist.  It's overkill for
> > what is actually needed for multibuffer hashing, which is a simple API that
> > hashes two buffers specified by virtual address.  Herbert's API creates lots of
> > unnecessary edge cases, most of which lack any testing.
> 
> Isn't that the whole *point* of the generic crypto layer?
> 
> Honestly, I think anybody who cares about modern CPU-based crypto
> should do what wireguard did: stop using the generic crypto layer,
> because it's fundamentally designed for odd async hardware in strange
> *legacy* models, and the whole basic design is around the indirection
> that allows different crypto engines.
> 
> Because that's the *point* of that code. I mean, a large part of the
> *design* of it is centered around having external crypto engines. And
> the thing you worry about is pretty much the opposite of that.
> 
> So if what you want is just fast modern crypto on the CPU, the generic
> interfaces are just odd and complicated.
> 
> Yes, they get less complicated if you limit yourself to the
> synchronous interfaces - which is, as you point out - why most people
> do exactly that.
> 
> Put another way: I don't disagree with you, but at the same time my
> reaction is that the generic crypto layer does what it has always
> done.
> 
> I get the feeling that you are arguing for avoiding the overheads and
> abstractions, and I'm not disagreeing. But overheads and abstractions
> is what that crypto layer is *for*.
> 
> I mean, you can do
> 
>         tfm = crypto_alloc_shash("crc32c", 0, 0);
> 
> and jump through the crazy hoops with the indirection of going through
> that tfm ("transformation object") that allocates a lot of extra info
> and works with other things. And it's designed to work with various
> non-CPU addresses etc.
> 
> Or you can just do
> 
>         crc = crc32c(crc, virt, cur_len);
> 
> and you're done - at the cost of only working with regular virtually
> mapped addresses. Your choice.
> 
> So I think you want to do the wireguard thing, and use the fixed and
> simple cases.
> 
> Yes, those interfaces only exist for a subset of things, but I think
> that subset of things is (a) the relevant subset and (b) the ones
> you'd do the whole parallel execution for anyway (afaik you did
> sha256).

The crypto_shash API is synchronous and operates on virtual addresses.  So it
just provides a simple way to support multiple hash algorithms, and none of the
legacy asynchronous hardware offload stuff.  It's crypto_ahash that has that.

Multibuffer hashing (interleaving multiple hashes) is CPU-based, and it requires
that all the lengths be synced up for it to work, which makes it very difficult
to support scatterlists.  So considering just crypto_shash and crypto_ahash, it
really belongs in crypto_shash (whereas Herbert wants it to go in crypto_ahash).

You're correct that it could go in a SHA-256 library function instead of either
crypto_shash or crypto_ahash.  I think it would be slightly more convenient to
have it in crypto_shash, since the users that want this (dm-verity and
fs-verity) do support multiple hash algorithms and appreciate having the
*simple* abstraction layer of crypto_shash.

But I'd be okay with having a separate code path for SHA-256 too, and maybe this
is the best way out of this...  No need to use the "Crypto API" at all if it's
not going to provide what is needed.

- Eric