This commit works around what seems like a flexfiles+rpcrdma bug, and Chuck Lever clarified that this shouldn't be needed: "Yes, the extra page needs to come from rq_pages. But I don't see why it should come from the /end/ of rq_pages." However, when using NFSD DIRECT for READ and NFS 4.2 client with pNFS flexfiles (and client gets a layout to use a v3 DS) over RDMA it is easy to see data mismatch when NFSD handles a misaligned DIO READ. If the same misaligned DIO READ is issued directly to the v3 DS over RDMA (so flexfiles is _not_ used) then no data mismatch occurs. Therefore, until this bug can be found, must use a 'start_extra' page from rq_pages that follows the NFS client requested READ payload (RDMA memory) if/when expanding the misaligned READ requires reading an extra partial page at the start of the READ so that its DIO-aligned. Otherwise if the 'start_extra' page is taken from the beginning of rq_pages the pNFS flexfiles client will see data mismatch corruption. As found, and then this fix of using the end of rq_pages verified, using the 'dt' utility: dt of=/mnt/share1/dt_a.test passes=1 bs=47008 count=2 \ iotype=sequential pattern=iot onerr=abort oncerr=abort see: https://github.com/RobinTMiller/dt.git Signed-off-by: Mike Snitzer <snitzer@xxxxxxxxxx> --- fs/nfsd/vfs.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index 5b3c6072b6f5c..e9ddeec3c9a32 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -1263,7 +1263,7 @@ __be32 nfsd_iter_read(struct svc_rqst *rqstp, struct svc_fh *fhp, if (read_dio.start_extra) { len = read_dio.start_extra; bvec_set_page(&rqstp->rq_bvec[v], - *(rqstp->rq_next_page++), + NULL, /* set below */ len, PAGE_SIZE - len); total -= len; ++v; @@ -1288,6 +1288,11 @@ __be32 nfsd_iter_read(struct svc_rqst *rqstp, struct svc_fh *fhp, base = 0; } WARN_ON_ONCE(v > rqstp->rq_maxpages); + /* FIXME: having the start_extra page come from the end of + * rq_pages[] works around what seems to be a flexfiles+rpcrdma bug. + */ + if ((kiocb.ki_flags & IOCB_DIRECT) && read_dio.start_extra) + rqstp->rq_bvec[0].bv_page = *(rqstp->rq_next_page++); trace_nfsd_read_vector(rqstp, fhp, offset, in_count); iov_iter_bvec(&iter, ITER_DEST, rqstp->rq_bvec, v, in_count); -- 2.44.0