Re: [RFC PATCH 1/2] NFSD: fix misaligned DIO READ to not use a start_extra_page, exposes rpcrdma bug?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Sep 04, 2025 at 12:10:00PM -0400, Chuck Lever wrote:
> On 9/4/25 10:42 AM, Mike Snitzer wrote:
> > On Tue, Sep 02, 2025 at 05:27:11PM -0400, Mike Snitzer wrote:
> >> On Tue, Sep 02, 2025 at 05:16:10PM -0400, Chuck Lever wrote:
> >>>
> >>> I am testing with a physically separate client and server, so I believe
> >>> that LOCALIO is not in play. I do see WRITEs. And other workloads (in
> >>> particular "fsx -Z <fname>") show READ traffic and I'm getting the
> >>> new trace point to fire quite a bit, and it is showing misaligned
> >>> READ requests. So it has something to do with dt.
> >>
> >> OK, yeah I figured you weren't doing loopback mount, only thing that
> >> came to mind for you not seeing READ like expected.  I haven't had any
> >> problems with dt not driving READs to NFSD...
> >>
> >> You'll certainly need to see READs in order for NFSD's new misaligned
> >> DIO READ handling to get tested.
> > 
> > I was doing some additional testing of the v9 changes last night and
> > realized why you weren't seeing any READs come through to NFSD:
> > "flags=direct" must be added to the dt commandline. Otherwise it'll
> > use buffered IO at the client and the READ will be serviced by the
> > client's page cache.
> > 
> > But like I said in another reply: when I just use v3 and RDMA (without
> > the intermediary of flexfiles at the client) I'm not able to see the
> > data mismatch with dt...
> > 
> > So while its unlikely: does adding "flags=direct" cause dt to fail
> > when NFSD handles the misaligned DIO READ?
> Applied v9.
> 
> Multiple successful runs, no failures after adding "flags=direct".
> Some excerpts from the last run show the server is seeing NFS
> READs now:
> 
> Filesystem options:
>   rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,hard,
>   fatal_neterrors=none,proto=rdma,port=20049,timeo=600,retrans=2,
>   sec=sys,mountaddr=192.168.2.55,mountvers=3,mountproto=tcp,
>   local_lock=none,addr=192.168.2.55
> 
> nfsd-1342  [004]   463.832928: nfsd_analyze_read_dio: xid=0x89784d89
> fh_hash=0x024204eb offset=0 len=47008 start=0+0 middle=0+47008 end=47008+96
> nfsd-1342  [004]   463.833105: nfsd_analyze_read_dio: xid=0x8a784d89
> fh_hash=0x024204eb offset=47008 len=47008 start=46592+416
> middle=47008+47008 end=94016+192
> nfsd-1342  [004]   463.833185: nfsd_analyze_read_dio: xid=0x8b784d89
> fh_hash=0x024204eb offset=94016 len=47008 start=93696+320
> middle=94016+47008 end=141024+288

OK, thanks for testing!

So yeah, patch 9/9 of v9 does workaround the problem relative to
flexfiles+RDMA (though patch header should really be updated to add
"flags=direct" to the dt command line):
https://lore.kernel.org/linux-nfs/20250903205121.41380-10-snitzer@xxxxxxxxxx/

Is it a tolerable intermediate workaround you'd be OK with?  To be
clear, I'm continuing to work the problem (and will be discussing it
with Trond)... but its a tricky one for sure.

Mike




[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux