On 5/13/25 13:39, Alexandre Ghiti wrote:
Hi Chunyan,
On 08/05/2025 09:14, Chunyan Zhang wrote:
Hi Palmer,
On Mon, 31 Mar 2025 at 23:55, Palmer Dabbelt <palmer@xxxxxxxxxxx> wrote:
On Wed, 05 Mar 2025 00:37:06 PST (-0800), zhangchunyan@xxxxxxxxxxx
wrote:
The assembly is originally based on the ARM NEON and int.uc, but uses
RISC-V vector instructions to implement the RAID6 syndrome and
recovery calculations.
The functions are tested on QEMU running with the option "-icount
shift=0":
Does anyone have hardware benchmarks for this? There's a lot more code
here than the other targets have. If all that unrolling is
necessary for
performance on real hardware then it seems fine to me, but just having
it for QEMU doesn't really tell us much.
I made tests on Banana Pi BPI-F3 and Canaan K230.
BPI-F3 is designed with SpacemiT K1 8-core RISC-V chip, the test
result on BPI-F3 was:
raid6: rvvx1 gen() 2916 MB/s
raid6: rvvx2 gen() 2986 MB/s
raid6: rvvx4 gen() 2975 MB/s
raid6: rvvx8 gen() 2763 MB/s
raid6: int64x8 gen() 1571 MB/s
raid6: int64x4 gen() 1741 MB/s
raid6: int64x2 gen() 1639 MB/s
raid6: int64x1 gen() 1394 MB/s
raid6: using algorithm rvvx2 gen() 2986 MB/s
raid6: .... xor() 2 MB/s, rmw enabled
raid6: using rvv recovery algorithm
So I'm playing with my new BananaPi and I got the following numbers:
[ 0.628134] raid6: int64x8 gen() 1074 MB/s
[ 0.696263] raid6: int64x4 gen() 1574 MB/s
[ 0.764383] raid6: int64x2 gen() 1677 MB/s
[ 0.832504] raid6: int64x1 gen() 1387 MB/s
[ 0.833824] raid6: using algorithm int64x2 gen() 1677 MB/s
[ 0.907378] raid6: .... xor() 829 MB/s, rmw enabled
[ 0.909301] raid6: using intx1 recovery algorithm
So I realize that you provided the numbers I asked for...Sorry about
that. That's a very nice improvement, well done.
I'll add your patch as-is for 6.16.
Thanks again,
Alex
The K230 uses the XuanTie C908 dual-core processor, with the larger
core C908 featuring the RVV1.0 extension, the test result on K230 was:
raid6: rvvx1 gen() 1556 MB/s
raid6: rvvx2 gen() 1576 MB/s
raid6: rvvx4 gen() 1590 MB/s
raid6: rvvx8 gen() 1491 MB/s
raid6: int64x8 gen() 1142 MB/s
raid6: int64x4 gen() 1628 MB/s
raid6: int64x2 gen() 1651 MB/s
raid6: int64x1 gen() 1391 MB/s
raid6: using algorithm int64x2 gen() 1651 MB/s
raid6: .... xor() 879 MB/s, rmw enabled
raid6: using rvv recovery algorithm
We can see the fastest unrolling algorithm was rvvx2 on BPI-F3 and
rvvx4 on K230 compared with other rvv algorithms.
I have only these two RVV boards for now, so no more testing data on
more different systems, I'm not sure if rvv8 will be needed on some
hardware or some other system environments.
Can we have a comparison before and after the use of your patch?
In addition, how do you check the correctness of your implementation?
I'll add whatever numbers you provide to the commit log and merge your
patch for 6.16.
Thanks a lot,
Alex
Thanks,
Chunyan
_______________________________________________
linux-riscv mailing list
linux-riscv@xxxxxxxxxxxxxxxxxxx
http://lists.infradead.org/mailman/listinfo/linux-riscv
_______________________________________________
linux-riscv mailing list
linux-riscv@xxxxxxxxxxxxxxxxxxx
http://lists.infradead.org/mailman/listinfo/linux-riscv