Re: [PATCH v3 0/5] NFSD: add "NFSD DIRECT" and "NFSD DONTCACHE" IO modes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Just a quick note to say that we are one of the examples (batch render
farm) where we rely on the NFSD pagecache a lot.

We have read heavy workloads where many clients share much of the same
input data (e.g. rendering sequential frames).

In fact, our 2 x 100gbit servers have 3TB of RAM and serve 70% of all
reads from nfsd pagecache. It is not uncommon to max out the 200gbit
network in this way even with spinning rust storage.

Anyway, as you were.

Daire

On Mon, 14 Jul 2025 at 23:42, Mike Snitzer <snitzer@xxxxxxxxxx> wrote:
>
> Hi,
>
> Summary (by Jeff Layton [0]):
> "The basic problem is that the pagecache is pretty useless for
> satisfying READs from nfsd. Most NFS workloads don't involve I/O to
> the same files from multiple clients. The client ends up having most
> of the data in its cache already and only very rarely do we need to
> revisit the data on the server.
>
> At the same time, it's really easy to overwhelm the storage with
> pagecache writeback with modern memory sizes. Having nfsd bypass the
> pagecache altogether is potentially a huge performance win, if it can
> be made to work safely."
>
> The performance win associated with using NFSD DIRECT was previously
> summarized here:
> https://lore.kernel.org/linux-nfs/aEslwqa9iMeZjjlV@xxxxxxxxxx/
> This picture offers a nice summary of performance gains:
> https://original.art/NFSD_direct_vs_buffered_IO.jpg
>
> This v3 series was developed ontop of Chuck's nfsd_testing which has 2
> patches that saw fh_getattr() moved, etc (v2 of this series included
> those patches but since they got review during v2 and Chuck already
> has them staged in nfsd-testing I didn't think it made sense to keep
> them included in this v3).
>
> Changes since v2 include:
> - explored suggestion to use string based interface (e.g. "direct"
>   instead of 3) but debugfs seems to only supports numeric values.
> - shifted numeric values for debugfs interface from 0-2 to 1-3 and
>   made 0 UNSPECIFIED (which is the default)
> - if user specifies io_cache_read or io_cache_write mode other than 1,
>   2 or 3 (via debugfs) they will get an error message
> - pass a data structure to nfsd_analyze_read_dio rather than so many
>   in/out params
> - improved comments as requested (e.g. "Must remove first
>   start_extra_page from rqstp->rq_bvec" was reworked)
> - use memmove instead of opencoded shift in
>   nfsd_complete_misaligned_read_dio
> - dropped the still very important "lib/iov_iter: remove piecewise
>   bvec length checking in iov_iter_aligned_bvec" patch because it
>   needs to be handled separately.
> - various other changes to improve code
>
> Thanks,
> Mike
>
> [0]: https://lore.kernel.org/linux-nfs/b1accdad470f19614f9d3865bb3a4c69958e5800.camel@xxxxxxxxxx/
>
> Mike Snitzer (5):
>   NFSD: filecache: add STATX_DIOALIGN and STATX_DIO_READ_ALIGN support
>   NFSD: pass nfsd_file to nfsd_iter_read()
>   NFSD: add io_cache_read controls to debugfs interface
>   NFSD: add io_cache_write controls to debugfs interface
>   NFSD: issue READs using O_DIRECT even if IO is misaligned
>
>  fs/nfsd/debugfs.c          | 102 +++++++++++++++++++
>  fs/nfsd/filecache.c        |  32 ++++++
>  fs/nfsd/filecache.h        |   4 +
>  fs/nfsd/nfs4xdr.c          |   8 +-
>  fs/nfsd/nfsd.h             |  10 ++
>  fs/nfsd/nfsfh.c            |   4 +
>  fs/nfsd/trace.h            |  37 +++++++
>  fs/nfsd/vfs.c              | 197 ++++++++++++++++++++++++++++++++++---
>  fs/nfsd/vfs.h              |   2 +-
>  include/linux/sunrpc/svc.h |   5 +-
>  10 files changed, 383 insertions(+), 18 deletions(-)
>
> --
> 2.44.0
>
>




[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux