Re: [RFC PATCH 0/3] Initial NFS client support for RWF_DONTCACHE

Chuck Lever <chuck.lever@xxxxxxxxxx> · Thu, 24 Apr 2025 12:59:56 -0400

On 4/24/25 12:51 PM, Mike Snitzer wrote:
> On Wed, Apr 23, 2025 at 11:30:21AM -0400, Chuck Lever wrote:
>> On 4/23/25 11:22 AM, Matthew Wilcox wrote:
>>> On Wed, Apr 23, 2025 at 10:38:37AM -0400, Chuck Lever wrote:
>>>> On 4/23/25 12:25 AM, trondmy@xxxxxxxxxx wrote:
>>>>> From: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx>
>>>>>
>>>>> The following patch set attempts to add support for the RWF_DONTCACHE
>>>>> flag in preadv2() and pwritev2() on NFS filesystems.
>>>>
>>>> Hi Trond-
>>>>
>>>> "RFC" in the subject field noted.
>>>>
>>>> The cover letter does not explain why one would want this facility, nor
>>>> does it quantify the performance implications.
>>>>
>>>> I can understand not wanting to cache on an NFS server, but don't you
>>>> want to maintain a data cache as close to applications as possible?
>>>
>>> If you look at the original work for RWF_DONTCACHE, you'll see this is
>>> the application providing the hint that it's doing a streaming access.
>>> It's only applied to folios which are created as a result of this
>>> access, and other accesses to these folios while the folios are in use
>>> clear the flag.  So it's kind of like O_DIRECT access, except that it
>>> does go through the page cache so there's none of this funky alignment
>>> requirement on the userspace buffers.
>>
>> OK, was wondering whether this behavior was opt-in; sounds like it is.
>> Thanks for setting me straight.
> 
> Yes, its certainly opt-in (requires setting a flag for each use).
> Jens added support in fio relatively recently, see:
> https://git.kernel.dk/cgit/fio/commit/?id=43c67b9f3a8808274bc1e0a3b7b70c56bb8a007f
> 
> Looking ahead relative to NFSD, as you know we've discussed exposing
> per-export config controls to enable use of DONTCACHE.  Finer controls
> (e.g. only large sequential IO) would be more desirable but I'm not
> aware of a simple means to detect such workloads with NFSD.
> 
> Could it be that we'd do well to carry through large folio support in
> NFSD and expose a configurable threshold that if met or exceeded then
> DONTCACHE used?
> 
> What is the status of large folio support in NFSD?  Is anyone actively
> working on it?

The nfsd_splice_actor() is the current bottleneck that converts large
folios from the page cache into a pipe full of single pages. The plan
is to measure the differences between NFSD's splice read and vectored
read paths. Hopefully they are close enough that we can remove splice
read. Beepy has said he will look into that performance measurement.

Anna has mentioned some work on large folio support using xdr_buf, but
I haven't reviewed patches there.

And, we need to get DMA API support for bio_vec iov iters to make the
socket and RDMA transports operate roughly equivalently. Leon has met
some resistance from the DMA maintainers, but pretty much every direct
consumer of the DMA API is anxious to get this facility.

Once those pre-requisites are in place, large folio support in NFSD
should be straightforward.

-- 
Chuck Lever