On Wed, 2025-06-11 at 15:18 -0400, Mike Snitzer wrote: > On Wed, Jun 11, 2025 at 10:31:20AM -0400, Chuck Lever wrote: > > On 6/10/25 4:57 PM, Mike Snitzer wrote: > > > Add 'enable-dontcache' to NFSD's debugfs interface so that: Any data > > > read or written by NFSD will either not be cached (thanks to O_DIRECT) > > > or will be removed from the page cache upon completion (DONTCACHE). > > > > I thought we were going to do two switches: One for reads and one for > > writes? I could be misremembering. > > We did discuss the possibility of doing that. Still can-do if that's > what you'd prefer. > Having them as separate controls in debugfs is fine for experimentation's sake, but I imagine we'll need to be all-in one way or the other with a real interface. I think if we can crack the problem of receiving WRITE payloads into an already-aligned buffer, then that becomes much more feasible. I think that's a solveable problem. > > After all, you are describing two different facilities here: a form of > > direct I/O for READs, and RWF_DONTCACHE for WRITEs (I think?). > > My thinking was NFSD doesn't need to provide faithful pure > RWF_DONTCACHE if it really doesn't make sense. But the "dontcache" > name can be (ab)used by NFSD to define it how it sees fit (O_DIRECT > doesn't cache so it seems fair). What I arrived at with this patchset > is how I described in my cover letter: > > When 'enable-dontcache' is used: > - all READs will use O_DIRECT (both DIO-aligned and misaligned) > - all DIO-aligned WRITEs will use O_DIRECT (useful for SUNRPC RDMA) > - misaligned WRITEs currently continue to use normal buffered IO > > But we reserve the right to iterate on the implementation details as > we see fit. Still using the umbrella of 'dontcache'. > > > > diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c > > > index 7d94fae1dee8..bba3e6f4f56b 100644 > > > --- a/fs/nfsd/vfs.c > > > +++ b/fs/nfsd/vfs.c > > > @@ -49,6 +49,7 @@ > > > #define NFSDDBG_FACILITY NFSDDBG_FILEOP > > > > > > bool nfsd_disable_splice_read __read_mostly; > > > +bool nfsd_enable_dontcache __read_mostly; > > > > > > /** > > > * nfserrno - Map Linux errnos to NFS errnos > > > @@ -1086,6 +1087,7 @@ __be32 nfsd_iter_read(struct svc_rqst *rqstp, struct svc_fh *fhp, > > > unsigned long v, total; > > > struct iov_iter iter; > > > loff_t ppos = offset; > > > + rwf_t flags = 0; > > > ssize_t host_err; > > > size_t len; > > > > > > @@ -1103,7 +1105,11 @@ __be32 nfsd_iter_read(struct svc_rqst *rqstp, struct svc_fh *fhp, > > > > > > trace_nfsd_read_vector(rqstp, fhp, offset, *count); > > > iov_iter_bvec(&iter, ITER_DEST, rqstp->rq_bvec, v, *count); > > > - host_err = vfs_iter_read(file, &iter, &ppos, 0); > > > + > > > + if (nfsd_enable_dontcache) > > > + flags |= RWF_DONTCACHE; > > > > Two things: > > > > - Maybe NFSD should record whether the file system is DONTCACHE-enabled > > in @fhp or in the export it is associated with, and then check that > > setting here before asserting RWF_DONTCACHE > > Sure, that'd be safer than allowing RWF_DONTCACHE to be tried only to > get EOPNOTSUPP because the underlying filesystem doesn't enable > support. > > Could follow what I did with nfsd_file only storing the dio_* > alignment data retrieved from statx IFF 'enable-dontcache' was enabled > at the time the nfsd_file was opened. > > By adding check for FOP_DONTCACHE being set in underlying filesystem. > But as-is, we're not actually using RWF_DONTCACHE in the final form of > what I've provided in this patchset. So can easily circle back to > adding this if/when we do decide to use RWF_DONTCACHE. > > > - I thought we were going with O_DIRECT for READs. > > Yes, this is just an intermediate patch that goes away in later > patches. I was more focused on minimal patch to get the > 'enable-dontcache' debugfs interface in place and tweaking it to its > ultimate form in later patch. > > I put in place a more general framework that can evolve... it being > more free-form (e.g. "don't worry your pretty head about the > implementation details, we'll worry for you"). > > Causes some reviewer angst I suppose, so I can just fold patches to do > away with unused intermediate state. > > Mike -- Jeff Layton <jlayton@xxxxxxxxxx>