Chuck and I were discussing RWF_DONTCACHE and he suggested that this might be an alternate approach. My main gripe with DONTCACHE was that it kicks off writeback after every WRITE operation. With NFS, we generally get a COMMIT operation at some point. Allowing us to batch up writes until that point has traditionally been considered better for performance. Instead of RWF_DONTCACHE, this patch has nfsd issue generic_fadvise(..., POSIX_FADV_DONTNEED) on the appropriate range after any READ, stable WRITE or COMMIT operation. This means that it doesn't change how and when dirty data gets flushed to the disk, but still keeps resident pagecache to a minimum. For reference, here are some numbers from a fio run doing sequential reads and writes, with the server in "normal" buffered I/O mode, with Mike's RWF_DONTCACHE patch enabled, and with fadvise(...DONTNEED). Jobfile: [global] name=fio-seq-RW filename=fio-seq-RW rw=rw rwmixread=60 rwmixwrite=40 bs=1M direct=0 numjobs=16 time_based runtime=300 [file1] size=100G ioengine=io_uring iodepth=16 :::::::::::::::::::::::::::::::::::: 3 runs each. Baseline (nothing enabled): Run status group 0 (all jobs): READ: bw=2999MiB/s (3144MB/s), 185MiB/s-189MiB/s (194MB/s-198MB/s), io=879GiB (944GB), run=300014-300087msec WRITE: bw=1998MiB/s (2095MB/s), 124MiB/s-126MiB/s (130MB/s-132MB/s), io=585GiB (629GB), run=300014-300087msec READ: bw=2866MiB/s (3005MB/s), 177MiB/s-181MiB/s (185MB/s-190MB/s), io=844GiB (906GB), run=301294-301463msec WRITE: bw=1909MiB/s (2002MB/s), 117MiB/s-121MiB/s (123MB/s-127MB/s), io=562GiB (604GB), run=301294-301463msec READ: bw=2885MiB/s (3026MB/s), 177MiB/s-183MiB/s (186MB/s-192MB/s), io=846GiB (908GB), run=300017-300117msec WRITE: bw=1923MiB/s (2016MB/s), 118MiB/s-122MiB/s (124MB/s-128MB/s), io=563GiB (605GB), run=300017-300117msec RWF_DONTCACHE: Run status group 0 (all jobs): READ: bw=3088MiB/s (3238MB/s), 189MiB/s-195MiB/s (198MB/s-205MB/s), io=906GiB (972GB), run=300015-300276msec WRITE: bw=2058MiB/s (2158MB/s), 126MiB/s-129MiB/s (132MB/s-136MB/s), io=604GiB (648GB), run=300015-300276msec READ: bw=3116MiB/s (3267MB/s), 191MiB/s-197MiB/s (201MB/s-206MB/s), io=913GiB (980GB), run=300022-300074msec WRITE: bw=2077MiB/s (2178MB/s), 128MiB/s-131MiB/s (134MB/s-137MB/s), io=609GiB (654GB), run=300022-300074msec READ: bw=3011MiB/s (3158MB/s), 185MiB/s-191MiB/s (194MB/s-200MB/s), io=886GiB (951GB), run=301049-301133msec WRITE: bw=2007MiB/s (2104MB/s), 123MiB/s-127MiB/s (129MB/s-133MB/s), io=590GiB (634GB), run=301049-301133msec fadvise(..., POSIX_FADV_DONTNEED): READ: bw=2918MiB/s (3060MB/s), 180MiB/s-184MiB/s (188MB/s-193MB/s), io=855GiB (918GB), run=300014-300111msec WRITE: bw=1944MiB/s (2038MB/s), 120MiB/s-123MiB/s (125MB/s-129MB/s), io=570GiB (612GB), run=300014-300111msec READ: bw=2951MiB/s (3095MB/s), 182MiB/s-188MiB/s (191MB/s-197MB/s), io=867GiB (931GB), run=300529-300695msec WRITE: bw=1966MiB/s (2061MB/s), 121MiB/s-124MiB/s (127MB/s-130MB/s), io=577GiB (620GB), run=300529-300695msec READ: bw=2971MiB/s (3115MB/s), 181MiB/s-188MiB/s (190MB/s-197MB/s), io=871GiB (935GB), run=300015-300077msec WRITE: bw=1979MiB/s (2076MB/s), 122MiB/s-125MiB/s (128MB/s-131MB/s), io=580GiB (623GB), run=300015-300077msec :::::::::::::::::::::::::::::: The numbers are pretty close, but it looks like RWF_DONTCACHE edges out the other modes. Also, with the RWF_DONTCACHE and fadvise() modes the pagecache utilization stays very low on the server (which is of course, the point). I think next I'll test a hybrid mode. Use RWF_DONTCACHE for READ and stable WRITE operations, and do the fadvise() only after COMMITs. Plumbing this in for v4 will be "interesting" if we decide this approach is sound, but it shouldn't be too bad if we only do it after a COMMIT. Thoughts? Signed-off-by: Jeff Layton <jlayton@xxxxxxxxxx> --- Jeff Layton (2): sunrpc: delay pc_release callback until after sending a reply nfsd: call generic_fadvise after v3 READ, stable WRITE or COMMIT fs/nfsd/debugfs.c | 2 ++ fs/nfsd/nfs3proc.c | 59 +++++++++++++++++++++++++++++++++++++++++++++--------- fs/nfsd/nfsd.h | 1 + fs/nfsd/nfsproc.c | 4 ++-- fs/nfsd/vfs.c | 21 ++++++++++++++----- fs/nfsd/vfs.h | 5 +++-- fs/nfsd/xdr3.h | 3 +++ net/sunrpc/svc.c | 19 ++++++++++++++---- 8 files changed, 92 insertions(+), 22 deletions(-) --- base-commit: 38ddcbef7f4e9c5aa075c8ccf9f6d5293e027951 change-id: 20250701-nfsd-testing-12e7c8da5f1c Best regards, -- Jeff Layton <jlayton@xxxxxxxxxx>