On Tue, 15 Jul 2025 at 14:31, Chuck Lever <chuck.lever@xxxxxxxxxx> wrote: > > On 7/15/25 5:24 AM, Daire Byrne wrote: > > Just a quick note to say that we are one of the examples (batch render > > farm) where we rely on the NFSD pagecache a lot. > > The new O_DIRECT style READs depend on the cache in the underlying block > devices to keep READs fast. So, there is still some caching happening > on the NFS server in this mode. Ah right, of course. I wonder how much we actually use nfsd pagecache versus the block device pagecache then... > > We have read heavy workloads where many clients share much of the same > > input data (e.g. rendering sequential frames). > > > > In fact, our 2 x 100gbit servers have 3TB of RAM and serve 70% of all > > reads from nfsd pagecache. It is not uncommon to max out the 200gbit > > network in this way even with spinning rust storage. > > Can you tell us what persistent storage underlies your data sets? Are > the hard drives in a hardware or software RAID, for example? Generally SAS attached external RAID arrays. We often use another smaller NVMe layer too (dm-cache or opencas) in front of it (LVM + XFS). But really, it's the 3TB of RAM per server (1PB disk) that does most of our heavy lifting. Our read/write ratio is something like 5:1 and we have a pretty aggressive/short writeback cache (to minimise long write backlogs). Looking forward to multi-threaded writeback to see how that helps us. > Note that Mike's features are enabled via a debugfs switch -- this is > because they are experimental for the moment. The default setting is > to continue using the server's page cache. Yep, all good. Like you said, it may be that we are more reliant on the block device cache anyway. Cheers, Daire