Re: performance r nfsd with RWF_DONTCACHE and larger wsizes

Jeff Layton <jlayton@xxxxxxxxxx> · Tue, 06 May 2025 20:06:51 -0400

On Wed, 2025-05-07 at 08:31 +1000, Dave Chinner wrote:
> On Tue, May 06, 2025 at 01:40:35PM -0400, Jeff Layton wrote:
> > FYI I decided to try and get some numbers with Mike's RWF_DONTCACHE
> > patches for nfsd [1]. Those add a module param that make all reads and
> > writes use RWF_DONTCACHE.
> > 
> > I had one host that was running knfsd with an XFS export, and a second
> > that was acting as NFS client. Both machines have tons of memory, so
> > pagecache utilization is irrelevant for this test.
> 
> Does RWF_DONTCACHE result in server side STABLE write requests from
> the NFS client, or are they still unstable and require a post-write
> completion COMMIT operation from the client to trigger server side
> writeback before the client can discard the page cache?
> 

The latter. I didn't change the client at all here (other than to allow
it to do bigger writes on the wire). It's just doing bog-standard
buffered I/O. nfsd is adding RWF_DONTCACHE to every write via Mike's
patch.

> > I tested sequential writes using the fio-seq_write.fio test, both with
> > and without the module param enabled.
> > 
> > These numbers are from one run each, but they were pretty stable over
> > several runs:
> > 
> > # fio /usr/share/doc/fio/examples/fio-seq-write.fio
> 
> $ cat /usr/share/doc/fio/examples/fio-seq-write.fio
> cat: /usr/share/doc/fio/examples/fio-seq-write.fio: No such file or directory
> $
> 
> What are the fio control parameters of the IO you are doing? (e.g.
> is this single threaded IO, does it use the psync, libaio or iouring
> engine, etc)
> 

; fio-seq-write.job for fiotest

[global]
name=fio-seq-write
filename=fio-seq-write
rw=write
bs=256K
direct=0
numjobs=1
time_based
runtime=900

[file1]
size=10G
ioengine=libaio
iodepth=16

> > wsize=1M:
> > 
> > Normal:      WRITE: bw=1034MiB/s (1084MB/s), 1034MiB/s-1034MiB/s (1084MB/s-1084MB/s), io=910GiB (977GB), run=901326-901326msec
> > DONTCACHE:   WRITE: bw=649MiB/s (681MB/s), 649MiB/s-649MiB/s (681MB/s-681MB/s), io=571GiB (613GB), run=900001-900001msec
> > 
> > DONTCACHE with a 1M wsize vs. recent (v6.14-ish) knfsd was about 30%
> > slower. Memory consumption was down, but these boxes have oodles of
> > memory, so I didn't notice much change there.
> 
> So what is the IO pattern that the NFSD is sending to the underlying
> XFS filesystem?
> 
> Is it sending 1M RWF_DONTCACHE buffered IOs to XFS as well (i.e. one
> buffered write IO per NFS client write request), or is DONTCACHE
> only being used on the NFS client side?
> 

It's should be sequential I/O, though the writes would be coming in
from different nfsd threads. nfsd just does standard buffered I/O. The
WRITE handler calls nfsd_vfs_write(), which calls vfs_write_iter().
With the module parameter enabled, it also adds RWF_DONTCACHE.

DONTCACHE is only being used on the server side. To be clear, the
protocol doesn't support that flag (yet), so we have no way to project
DONTCACHE from the client to the server (yet). This is just early
exploration to see whether DONTCACHE offers any benefit to this
workload.

> > I wonder if we need some heuristic that makes generic_write_sync() only
> > kick off writeback immediately if the whole folio is dirty so we have
> > more time to gather writes before kicking off writeback?
> 
> You're doing aligned 1MB IOs - there should be no partially dirty
> large folios in either the client or the server page caches.
> 

Interesting. I wonder what accounts for the slowdown with 1M writes? It
seems likely to be related to the more aggressive writeback with
DONTCACHE enabled, but it'd be good to understand this.

> That said, this is part of the reason I asked about both whether the
> client side write is STABLE and  whether RWF_DONTCACHE on
> the server side. i.e. using either of those will trigger writeback
> on the serer side immediately; in the case of the former it will
> also complete before returning to the client and not require a
> subsequent COMMIT RPC to wait for server side IO completion...
> 

I need to go back and sniff traffic to be sure, but I'm fairly certain
the client is issuing regular UNSTABLE writes and following up with a
later COMMIT, at least for most of them. The occasional STABLE write
might end up getting through, but that should be fairly rare.

-- 
Jeff Layton <jlayton@xxxxxxxxxx>