On 6/10/25 4:57 PM, Mike Snitzer wrote: > IO must be aligned, otherwise it falls back to using buffered IO. > > RWF_DONTCACHE is _not_ currently used for misaligned IO (even when > nfsd/enable-dontcache=1) because it works against us (due to RMW > needing to read without benefit of cache), whereas buffered IO enables > misaligned IO to be more performant. > > Signed-off-by: Mike Snitzer <snitzer@xxxxxxxxxx> > --- > fs/nfsd/vfs.c | 40 ++++++++++++++++++++++++++++++++++++---- > 1 file changed, 36 insertions(+), 4 deletions(-) > > diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c > index e7cc8c6dfbad..a942609e3ab9 100644 > --- a/fs/nfsd/vfs.c > +++ b/fs/nfsd/vfs.c > @@ -1064,6 +1064,22 @@ __be32 nfsd_splice_read(struct svc_rqst *rqstp, struct svc_fh *fhp, > return nfsd_finish_read(rqstp, fhp, file, offset, count, eof, host_err); > } > > +static bool is_dio_aligned(const struct iov_iter *iter, loff_t offset, > + const u32 blocksize) > +{ > + u32 blocksize_mask; > + > + if (!blocksize) > + return false; > + > + blocksize_mask = blocksize - 1; > + if ((offset & blocksize_mask) || > + (iov_iter_alignment(iter) & blocksize_mask)) > + return false; > + > + return true; > +} > + > /** > * nfsd_iter_read - Perform a VFS read using an iterator > * @rqstp: RPC transaction context > @@ -1107,8 +1123,16 @@ __be32 nfsd_iter_read(struct svc_rqst *rqstp, struct svc_fh *fhp, > trace_nfsd_read_vector(rqstp, fhp, offset, *count); > iov_iter_bvec(&iter, ITER_DEST, rqstp->rq_bvec, v, *count); > > - if (nfsd_enable_dontcache) > - flags |= RWF_DONTCACHE; > + if (nfsd_enable_dontcache) { > + if (is_dio_aligned(&iter, offset, nf->nf_dio_read_offset_align)) > + flags |= RWF_DIRECT; > + /* FIXME: not using RWF_DONTCACHE for misaligned IO because it works > + * against us (due to RMW needing to read without benefit of cache), > + * whereas buffered IO enables misaligned IO to be more performant. > + */ > + //else > + // flags |= RWF_DONTCACHE; > + } > > host_err = vfs_iter_read(file, &iter, &ppos, flags); > return nfsd_finish_read(rqstp, fhp, file, offset, count, eof, host_err); > @@ -1217,8 +1241,16 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp, > nvecs = xdr_buf_to_bvec(rqstp->rq_bvec, rqstp->rq_maxpages, payload); > iov_iter_bvec(&iter, ITER_SOURCE, rqstp->rq_bvec, nvecs, *cnt); > > - if (nfsd_enable_dontcache) > - flags |= RWF_DONTCACHE; > + if (nfsd_enable_dontcache) { > + if (is_dio_aligned(&iter, offset, nf->nf_dio_offset_align)) > + flags |= RWF_DIRECT; > + /* FIXME: not using RWF_DONTCACHE for misaligned IO because it works > + * against us (due to RMW needing to read without benefit of cache), > + * whereas buffered IO enables misaligned IO to be more performant. > + */ > + //else > + // flags |= RWF_DONTCACHE; > + } IMO adding RWF_DONTCACHE first then replacing it later in the series with a form of O_DIRECT is confusing. Also, why add RWF_DONTCACHE here and then take it away "because it doesn't work"? But OK, your series is really a proof-of-concept. Something to work out before it is merge-ready, I guess. It is much more likely for NFS READ requests to be properly aligned. Clients are generally good about that. NFS WRITE request alignment is going to be arbitrary. Fwiw. However, one thing we discussed at bake-a-thon was what to do about unstable WRITEs. For unstable WRITEs, the server has to cache the write data at least until the client sends a COMMIT. Otherwise the server will have to convert all UNSTABLE writes to FILE_SYNC writes, and that can have performance implications. One thing you might consider is to continue using the page cache for unstable WRITEs, and then use fadvise DONTNEED after a successful COMMIT operation to reduce page cache footprint. Unstable writes to the same range of the file might be a problem, however. > since = READ_ONCE(file->f_wb_err); > if (verf) -- Chuck Lever