Re: [PATCH v3 0/5] NFSD: add "NFSD DIRECT" and "NFSD DONTCACHE" IO modes

Daire Byrne <daire@xxxxxxxx> · Wed, 16 Jul 2025 11:28:12 +0100

On Tue, 15 Jul 2025 at 14:31, Chuck Lever <chuck.lever@xxxxxxxxxx> wrote:
>
> On 7/15/25 5:24 AM, Daire Byrne wrote:
> > Just a quick note to say that we are one of the examples (batch render
> > farm) where we rely on the NFSD pagecache a lot.
>
> The new O_DIRECT style READs depend on the cache in the underlying block
> devices to keep READs fast. So, there is still some caching happening
> on the NFS server in this mode.

Ah right, of course. I wonder how much we actually use nfsd pagecache
versus the block device pagecache then...

> > We have read heavy workloads where many clients share much of the same
> > input data (e.g. rendering sequential frames).
> >
> > In fact, our 2 x 100gbit servers have 3TB of RAM and serve 70% of all
> > reads from nfsd pagecache. It is not uncommon to max out the 200gbit
> > network in this way even with spinning rust storage.
>
> Can you tell us what persistent storage underlies your data sets? Are
> the hard drives in a hardware or software RAID, for example?

Generally SAS attached external RAID arrays. We often use another
smaller NVMe layer too (dm-cache or opencas) in front of it (LVM +
XFS).

But really, it's the 3TB of RAM per server (1PB disk) that does most
of our heavy lifting. Our read/write ratio is something like 5:1 and
we have a pretty aggressive/short writeback cache (to minimise long
write backlogs). Looking forward to multi-threaded writeback to see
how that helps us.

> Note that Mike's features are enabled via a debugfs switch -- this is
> because they are experimental for the moment. The default setting is
> to continue using the server's page cache.

Yep, all good. Like you said, it may be that we are more reliant on
the block device cache anyway.

Cheers,

Daire