Re: [PATCH v4] crypto: riscv/poly1305 - import OpenSSL/CRYPTOGAMS implementation

Zhihang Shao <zhihang.shao.iscas@xxxxxxxxx> · Sun, 20 Jul 2025 17:10:13 +0800

Hi Eric,

I recently ran a test using the Kunit module you wrote for testing
poly1305, which I executed on QEMU RISC-V 64, . The results show a
significant performance improvement of the optimized implementation
compared to the generic one. The test data are as follows:

--- base.log    2025-07-19 17:41:06.443392989 +0800
+++ optimized.log       2025-07-19 17:40:45.650048601 +0800
@@ -1,31 +1,31 @@
-[    0.668631]     # Subtest: poly1305
-[    0.668774]     # module: poly1305_kunit
-[    0.668857]     1..12
-[    0.670267]     ok 1 test_hash_test_vectors
-[    0.679479]     ok 2 test_hash_all_lens_up_to_4096
-[    0.696048]     ok 3 test_hash_incremental_updates
-[    0.697645]     ok 4 test_hash_buffer_overruns
-[    0.701060]     ok 5 test_hash_overlaps
-[    0.702858]     ok 6 test_hash_alignment_consistency
-[    0.703108]     ok 7 test_hash_ctx_zeroization
-[    0.846150]     ok 8 test_hash_interrupt_context_1
-[    1.235247]     ok 9 test_hash_interrupt_context_2
-[    1.250813]     ok 10 test_poly1305_allones_keys_and_message
-[    1.251138]     ok 11 test_poly1305_reduction_edge_cases
-[    1.287196]     # benchmark_hash: len=1: 2 MB/s
-[    1.305363]     # benchmark_hash: len=16: 61 MB/s
-[    1.321102]     # benchmark_hash: len=64: 212 MB/s
-[    1.340105]     # benchmark_hash: len=127: 263 MB/s
-[    1.353880]     # benchmark_hash: len=128: 364 MB/s
-[    1.370118]     # benchmark_hash: len=200: 377 MB/s
-[    1.381879]     # benchmark_hash: len=256: 570 MB/s
-[    1.394125]     # benchmark_hash: len=511: 657 MB/s
-[    1.404265]     # benchmark_hash: len=512: 794 MB/s
-[    1.413356]     # benchmark_hash: len=1024: 985 MB/s
-[    1.421925]     # benchmark_hash: len=3173: 1131 MB/s
-[    1.429956]     # benchmark_hash: len=4096: 1218 MB/s
-[    1.438184]     # benchmark_hash: len=16384: 1216 MB/s
-[    1.438462]     ok 12 benchmark_hash
-[    1.438686] # poly1305: pass:12 fail:0 skip:0 total:12
-[    1.438763] # Totals: pass:12 fail:0 skip:0 total:12
-[    1.438904] ok 1 poly1305
+[    0.666280]     # Subtest: poly1305
+[    0.666413]     # module: poly1305_kunit
+[    0.666490]     1..12
+[    0.667702]     ok 1 test_hash_test_vectors
+[    0.672896]     ok 2 test_hash_all_lens_up_to_4096
+[    0.686244]     ok 3 test_hash_incremental_updates
+[    0.687263]     ok 4 test_hash_buffer_overruns
+[    0.689957]     ok 5 test_hash_overlaps
+[    0.691393]     ok 6 test_hash_alignment_consistency
+[    0.691622]     ok 7 test_hash_ctx_zeroization
+[    0.769741]     ok 8 test_hash_interrupt_context_1
+[    0.930832]     ok 9 test_hash_interrupt_context_2
+[    0.940068]     ok 10 test_poly1305_allones_keys_and_message
+[    0.940478]     ok 11 test_poly1305_reduction_edge_cases
+[    0.964546]     # benchmark_hash: len=1: 3 MB/s
+[    0.978836]     # benchmark_hash: len=16: 78 MB/s
+[    0.990414]     # benchmark_hash: len=64: 289 MB/s
+[    1.003012]     # benchmark_hash: len=127: 397 MB/s
+[    1.012755]     # benchmark_hash: len=128: 517 MB/s
+[    1.022928]     # benchmark_hash: len=200: 603 MB/s
+[    1.030981]     # benchmark_hash: len=256: 835 MB/s
+[    1.038706]     # benchmark_hash: len=511: 1046 MB/s
+[    1.045233]     # benchmark_hash: len=512: 1240 MB/s
+[    1.050733]     # benchmark_hash: len=1024: 1638 MB/s
+[    1.055620]     # benchmark_hash: len=3173: 1998 MB/s
+[    1.060247]     # benchmark_hash: len=4096: 2132 MB/s
+[    1.064695]     # benchmark_hash: len=16384: 2267 MB/s
+[    1.065179]     ok 12 benchmark_hash
+[    1.065425] # poly1305: pass:12 fail:0 skip:0 total:12
+[    1.065498] # Totals: pass:12 fail:0 skip:0 total:12
+[    1.065612] ok 1 poly1305

Next, I plan to validate this performance gain on actual RISC-V
hardware. I will also submit a v5 patch to the mailing list.
Look forward to your feedback and suggestions.

- Zhihang