Re: performance r nfsd with RWF_DONTCACHE and larger wsizes

Jeff Layton <jlayton@xxxxxxxxxx> · Tue, 06 May 2025 14:30:04 -0400

On Tue, 2025-05-06 at 14:16 -0400, Chuck Lever wrote:
> On 5/6/25 1:40 PM, Jeff Layton wrote:
> > FYI I decided to try and get some numbers with Mike's RWF_DONTCACHE
> > patches for nfsd [1]. Those add a module param that make all reads and
> > writes use RWF_DONTCACHE.
> > 
> > I had one host that was running knfsd with an XFS export, and a second
> > that was acting as NFS client. Both machines have tons of memory, so
> > pagecache utilization is irrelevant for this test.
> > 
> > I tested sequential writes using the fio-seq_write.fio test, both with
> > and without the module param enabled.
> > 
> > These numbers are from one run each, but they were pretty stable over
> > several runs:
> > 
> > # fio /usr/share/doc/fio/examples/fio-seq-write.fio
> > 
> > wsize=1M:
> > 
> > Normal:      WRITE: bw=1034MiB/s (1084MB/s), 1034MiB/s-1034MiB/s (1084MB/s-1084MB/s), io=910GiB (977GB), run=901326-901326msec
> > DONTCACHE:   WRITE: bw=649MiB/s (681MB/s), 649MiB/s-649MiB/s (681MB/s-681MB/s), io=571GiB (613GB), run=900001-900001msec
> > 
> > DONTCACHE with a 1M wsize vs. recent (v6.14-ish) knfsd was about 30%
> > slower. Memory consumption was down, but these boxes have oodles of
> > memory, so I didn't notice much change there.
> > 
> > Chris suggested that the write sizes were too small in this test, so I
> > grabbed Chuck's patches to increase the max RPC payload size [2] to 4M,
> > and patched the client to allow a wsize that big:
> > 
> > wsize=4M:
> > 
> > Normal:       WRITE: bw=1053MiB/s (1104MB/s), 1053MiB/s-1053MiB/s (1104MB/s-1104MB/s), io=930GiB (999GB), run=904526-904526msec
> > DONTCACHE:    WRITE: bw=1191MiB/s (1249MB/s), 1191MiB/s-1191MiB/s (1249MB/s-1249MB/s), io=1050GiB (1127GB), run=902781-902781msec
> > 
> > Not much change with normal buffered I/O here, but DONTCACHE is faster
> > with a 4M wsize. My suspicion (unconfirmed) is that the dropbehind flag
> > ends up causing partially-written large folios in the pagecache to get
> > written back too early, and that slows down later writes to the same
> > folios.
> 
> My feeling is that at this point, the NFSD read and write paths are not
> currently tuned for large folios -- they break every I/O into single
> pages.
> 

*nod*

> 
> > I wonder if we need some heuristic that makes generic_write_sync() only
> > kick off writeback immediately if the whole folio is dirty so we have
> > more time to gather writes before kicking off writeback?
> 
> Mike has suggested that NFSD should limit the use RWF_UNCACHED to
> WRITE requests with large payloads (for some arbitrary definition of
> "large").
> 

Yeah. I think we need something along those lines.

> 
> > This might also be a good reason to think about a larger rsize/wsize
> > limit in the client.
> > 
> > I'd like to also test reads with this flag, but I'm currently getting
> > back that EOPNOTSUPP error when I try to test them.
> 
> That's expected for that patch series.
> 

Yep, I figured.

> But I have to ask: what problem do you expect RWF_UNCACHED to solve?
> 

I don't have a problem to solve, per-se. I was mainly just wondering
what sort of effect RWF_DONTCACHE and larger payloads would have on
performance.

> 
> > [1]: https://lore.kernel.org/linux-nfs/20250220171205.12092-1-
> > snitzer@xxxxxxxxxx/
> > [2]: https://lore.kernel.org/linux-nfs/20250428193702.5186-15-
> > cel@xxxxxxxxxx/
> 

-- 
Jeff Layton <jlayton@xxxxxxxxxx>