Re: [PATCH v2] fuse: use splice for reading user pages on servers that enable it

Joanne Koong <joannelkoong@xxxxxxxxx> · Tue, 13 May 2025 14:29:01 -0700

On Mon, May 12, 2025 at 10:46 PM Miklos Szeredi <miklos@xxxxxxxxxx> wrote:
>
> On Mon, 12 May 2025 at 21:03, Joanne Koong <joannelkoong@xxxxxxxxx> wrote:
> >
> > On Wed, May 7, 2025 at 7:45 AM Miklos Szeredi <miklos@xxxxxxxxxx> wrote:
> > >
> > > On Wed, 23 Apr 2025 at 01:56, Joanne Koong <joannelkoong@xxxxxxxxx> wrote:
> > >
> > > > For servers that do not need to access pages after answering the
> > > > request, splice gives a non-trivial improvement in performance.
> > > > Benchmarks show roughly a 40% speedup.
> > >
> > > Hmm, have you looked at where this speedup comes from?
> > >
> > > Is this a real zero-copy scenario where the server just forwards the
> > > pages to a driver which does DMA, so that the CPU never actually
> > > touches the page contents?
> >
> > I ran the benchmarks last month on the passthrough_ll server (from the
> > libfuse examples) with the actual copying out / buffer processing
> > removed (eg the .write_buf handler immediately returns
> > "fuse_reply_write(req, fuse_buf_size(in_buf));".
>
> Ah, ok.
>
> It would be good to see results in a more realistic scenario than that
> before deciding to do this.

The results vary depending on how IO-intensive the server-side
processing logic is (eg ones that are not as intensive would show a
bigger relative performance speedup than ones where a lot of time is
spent on server-side processing). I can include the results from
benchmarks on our internal fuse server, which forwards the data in the
write buffer to a remote server over the network. For that, we saw
roughly a 5% improvement in throughput for 5 GB writes with 16 MB
chunk sizes, and a 2.45% improvement in throughput for 12 parallel
writes of 16 GB files with 64 MB chunk sizes.

Thanks,
Joanne

>
> Thanks,
> Miklos