On Tue, May 27, 2025 at 10:32:00AM +0800, Gao Xiang wrote: > > > On 2025/5/8 12:19, Eric Biggers wrote: > > ... > > > > > BTW, I also have to wonder why this patchset is proposing accelerating zlib > > instead of Zstandard. Zstandard is a much more modern algorithm. > > I think simply because QAT doesn't support the Zstandard native offload. > At least, for Intel Xeon Sapphire Rapids processors (it seems to have > built-in QAT 4xxx), only LZ4 and deflate-family are natively supported. > > I've confirmed that SPR QAT deflate hardware decompresion already surpasses > LZ4 software decompression on our cloud server setup, which is useful since > it greatly improves decompression performance (even compared to software LZ4) > and saves CPU overhead completely. Does this measure the overall time of decompression (including the setup steps, like the scatter/gather or similar, allocating requests, waiting etc)?. Comparing that to the library calls plus the input page iteration. I haven't found any public benchmarks with the QAT enabled compression. I'm interested how it's benchmarked because we'v had people pointing out that LZ4 itself is very fast, but when the overhead is taken into account it's reducing the overall performance. Thanks.