Re: [PATCH V5] raid6: Add RISC-V SIMD syndrome and recovery calculations

Alexandre Ghiti <alex@xxxxxxxx> · Wed, 21 May 2025 11:00:44 +0200

On 5/13/25 13:39, Alexandre Ghiti wrote:
Hi Chunyan,

On 08/05/2025 09:14, Chunyan Zhang wrote:
Hi Palmer,

On Mon, 31 Mar 2025 at 23:55, Palmer Dabbelt <palmer@xxxxxxxxxxx> wrote:
On Wed, 05 Mar 2025 00:37:06 PST (-0800), zhangchunyan@xxxxxxxxxxx 
wrote:
The assembly is originally based on the ARM NEON and int.uc, but uses
RISC-V vector instructions to implement the RAID6 syndrome and
recovery calculations.

The functions are tested on QEMU running with the option "-icount 
shift=0":
Does anyone have hardware benchmarks for this?  There's a lot more code
here than the other targets have.  If all that unrolling is 
necessary for
performance on real hardware then it seems fine to me, but just having
it for QEMU doesn't really tell us much.
I made tests on Banana Pi BPI-F3 and Canaan K230.

BPI-F3 is designed with SpacemiT K1 8-core RISC-V chip, the test
result on BPI-F3 was:

   raid6: rvvx1    gen()  2916 MB/s
   raid6: rvvx2    gen()  2986 MB/s
   raid6: rvvx4    gen()  2975 MB/s
   raid6: rvvx8    gen()  2763 MB/s
   raid6: int64x8  gen()  1571 MB/s
   raid6: int64x4  gen()  1741 MB/s
   raid6: int64x2  gen()  1639 MB/s
   raid6: int64x1  gen()  1394 MB/s
   raid6: using algorithm rvvx2 gen() 2986 MB/s
   raid6: .... xor() 2 MB/s, rmw enabled
   raid6: using rvv recovery algorithm

So I'm playing with my new BananaPi and I got the following numbers:

[    0.628134] raid6: int64x8  gen()  1074 MB/s
[    0.696263] raid6: int64x4  gen()  1574 MB/s
[    0.764383] raid6: int64x2  gen()  1677 MB/s
[    0.832504] raid6: int64x1  gen()  1387 MB/s
[    0.833824] raid6: using algorithm int64x2 gen() 1677 MB/s
[    0.907378] raid6: .... xor() 829 MB/s, rmw enabled
[    0.909301] raid6: using intx1 recovery algorithm

So I realize that you provided the numbers I asked for...Sorry about 
that. That's a very nice improvement, well done.

I'll add your patch as-is for 6.16.

Thanks again,

Alex

The K230 uses the XuanTie C908 dual-core processor, with the larger
core C908 featuring the RVV1.0 extension, the test result on K230 was:

   raid6: rvvx1    gen()  1556 MB/s
   raid6: rvvx2    gen()  1576 MB/s
   raid6: rvvx4    gen()  1590 MB/s
   raid6: rvvx8    gen()  1491 MB/s
   raid6: int64x8  gen()  1142 MB/s
   raid6: int64x4  gen()  1628 MB/s
   raid6: int64x2  gen()  1651 MB/s
   raid6: int64x1  gen()  1391 MB/s
   raid6: using algorithm int64x2 gen() 1651 MB/s
   raid6: .... xor() 879 MB/s, rmw enabled
   raid6: using rvv recovery algorithm

We can see the fastest unrolling algorithm was rvvx2 on BPI-F3 and
rvvx4 on K230 compared with other rvv algorithms.

I have only these two RVV boards for now, so no more testing data on
more different systems, I'm not sure if rvv8 will be needed on some
hardware or some other system environments.

Can we have a comparison before and after the use of your patch?

In addition, how do you check the correctness of your implementation?

I'll add whatever numbers you provide to the commit log and merge your 
patch for 6.16.

Thanks a lot,

Alex

Thanks,
Chunyan

_______________________________________________
linux-riscv mailing list
linux-riscv@xxxxxxxxxxxxxxxxxxx
http://lists.infradead.org/mailman/listinfo/linux-riscv

_______________________________________________
linux-riscv mailing list
linux-riscv@xxxxxxxxxxxxxxxxxxx
http://lists.infradead.org/mailman/listinfo/linux-riscv