On Wed, May 14, 2025 at 4:57 AM Miklos Szeredi <miklos@xxxxxxxxxx> wrote: > > On Tue, 13 May 2025 at 23:29, Joanne Koong <joannelkoong@xxxxxxxxx> wrote: > > > The results vary depending on how IO-intensive the server-side > > processing logic is (eg ones that are not as intensive would show a > > bigger relative performance speedup than ones where a lot of time is > > spent on server-side processing). I can include the results from > > benchmarks on our internal fuse server, which forwards the data in the > > write buffer to a remote server over the network. For that, we saw > > roughly a 5% improvement in throughput for 5 GB writes with 16 MB > > chunk sizes, and a 2.45% improvement in throughput for 12 parallel > > writes of 16 GB files with 64 MB chunk sizes. > > Okay, those are much saner numbers. Sorry, I wasn't trying to intentionally inflate the numbers. I thought isolating it to the kernel speedup was the most accurate else the numbers depend on the individual server implementation. > > Does the server use MSG_ZEROCOPY? No. The server copies the buffer to another buffer (for later processing) so that the server can immediately reply to the request and not hold up work on that libfuse thread. Splice here helps because it gets rid of 1 copy, eg instead of copying the data to the libfuse buffer and then from libfuse buffer to this other buffer, we can now just do a read() on the file descriptor returned from splice into the other buffer. > > Can you please include these numbers and the details on how the server > takes advantage of splice in the patch header? I will resubmit this as v3 with the numbers and details included in the patch header underneath the commit message. Thanks, Joanne > > Thanks, > Miklos