On Wed, Jun 11, 2025 at 10:31:20AM -0400, Chuck Lever wrote: > On 6/10/25 4:57 PM, Mike Snitzer wrote: > > Add 'enable-dontcache' to NFSD's debugfs interface so that: Any data > > read or written by NFSD will either not be cached (thanks to O_DIRECT) > > or will be removed from the page cache upon completion (DONTCACHE). > > I thought we were going to do two switches: One for reads and one for > writes? I could be misremembering. We did discuss the possibility of doing that. Still can-do if that's what you'd prefer. > After all, you are describing two different facilities here: a form of > direct I/O for READs, and RWF_DONTCACHE for WRITEs (I think?). My thinking was NFSD doesn't need to provide faithful pure RWF_DONTCACHE if it really doesn't make sense. But the "dontcache" name can be (ab)used by NFSD to define it how it sees fit (O_DIRECT doesn't cache so it seems fair). What I arrived at with this patchset is how I described in my cover letter: When 'enable-dontcache' is used: - all READs will use O_DIRECT (both DIO-aligned and misaligned) - all DIO-aligned WRITEs will use O_DIRECT (useful for SUNRPC RDMA) - misaligned WRITEs currently continue to use normal buffered IO But we reserve the right to iterate on the implementation details as we see fit. Still using the umbrella of 'dontcache'. > > diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c > > index 7d94fae1dee8..bba3e6f4f56b 100644 > > --- a/fs/nfsd/vfs.c > > +++ b/fs/nfsd/vfs.c > > @@ -49,6 +49,7 @@ > > #define NFSDDBG_FACILITY NFSDDBG_FILEOP > > > > bool nfsd_disable_splice_read __read_mostly; > > +bool nfsd_enable_dontcache __read_mostly; > > > > /** > > * nfserrno - Map Linux errnos to NFS errnos > > @@ -1086,6 +1087,7 @@ __be32 nfsd_iter_read(struct svc_rqst *rqstp, struct svc_fh *fhp, > > unsigned long v, total; > > struct iov_iter iter; > > loff_t ppos = offset; > > + rwf_t flags = 0; > > ssize_t host_err; > > size_t len; > > > > @@ -1103,7 +1105,11 @@ __be32 nfsd_iter_read(struct svc_rqst *rqstp, struct svc_fh *fhp, > > > > trace_nfsd_read_vector(rqstp, fhp, offset, *count); > > iov_iter_bvec(&iter, ITER_DEST, rqstp->rq_bvec, v, *count); > > - host_err = vfs_iter_read(file, &iter, &ppos, 0); > > + > > + if (nfsd_enable_dontcache) > > + flags |= RWF_DONTCACHE; > > Two things: > > - Maybe NFSD should record whether the file system is DONTCACHE-enabled > in @fhp or in the export it is associated with, and then check that > setting here before asserting RWF_DONTCACHE Sure, that'd be safer than allowing RWF_DONTCACHE to be tried only to get EOPNOTSUPP because the underlying filesystem doesn't enable support. Could follow what I did with nfsd_file only storing the dio_* alignment data retrieved from statx IFF 'enable-dontcache' was enabled at the time the nfsd_file was opened. By adding check for FOP_DONTCACHE being set in underlying filesystem. But as-is, we're not actually using RWF_DONTCACHE in the final form of what I've provided in this patchset. So can easily circle back to adding this if/when we do decide to use RWF_DONTCACHE. > - I thought we were going with O_DIRECT for READs. Yes, this is just an intermediate patch that goes away in later patches. I was more focused on minimal patch to get the 'enable-dontcache' debugfs interface in place and tweaking it to its ultimate form in later patch. I put in place a more general framework that can evolve... it being more free-form (e.g. "don't worry your pretty head about the implementation details, we'll worry for you"). Causes some reviewer angst I suppose, so I can just fold patches to do away with unused intermediate state. Mike