> The only way you can avoid memory copies here is to use RDMA to allow > the server to write its replies directly into the correct client read > buffers. I remounted with rdma [root@23-127-77-6 ~]# mount -t nfs -o proto=rdma,nconnect=16,rsize=4194304,wsize=4194304 192.168.0.7:/mnt /mnt [root@23-127-77-6 ~]# mount -v|grep -i rdma 192.168.0.7:/mnt on /mnt type nfs4 (rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,fatal_neterrors=none,proto=rdma,nconnect=16,port=20049,timeo=600,retrans=2,sec=sys,clientaddr=192.168.0.8,local_lock=none,addr=192.168.0.7) [root@23-127-77-6 ~]# and repeat sequential read. According to perf top, memcpy is gone, Samples: 64K of event 'cycles:P', 4000 Hz, Event count (approx.): 22510217633 lost: 0/0 drop: 0/0 Overhead Shared Object Symbol 13,12% [nfs] [k] nfs_generic_pg_test 11,32% [nfs] [k] nfs_page_group_lock 10,42% [nfs] [k] nfs_clear_request 5,41% [kernel] [k] gup_fast_pte_range 4,11% [nfs] [k] nfs_page_group_sync_on_bit 3,36% [nfs] [k] nfs_page_create 3,13% [nfs] [k] __nfs_pageio_add_request 2,10% [nfs] [k] __nfs_find_lock_context but it didn't improve read bandwidth at all. Even slightly worse compared to proto=tcp. Anton вт, 22 июл. 2025 г. в 21:43, Trond Myklebust <trondmy@xxxxxxxxxx>: > > On Tue, 2025-07-22 at 21:10 +0300, Anton Gavriliuk wrote: > > Hi > > > > I am trying to exceed 20 GB/s doing sequential read from a single > > file > > on the nfs client. > > > > perf top shows excessive memcpy usage: > > > > Samples: 237K of event 'cycles:P', 4000 Hz, Event count (approx.): > > 120872739112 lost: 0/0 drop: 0/0 > > Overhead Shared Object Symbol > > 20,54% [kernel] [k] memcpy > > 6,52% [nfs] [k] nfs_generic_pg_test > > 5,12% [nfs] [k] nfs_page_group_lock > > 4,92% [kernel] [k] _copy_to_iter > > 4,79% [kernel] [k] gro_list_prepare > > 2,77% [nfs] [k] nfs_clear_request > > 2,10% [nfs] [k] > > __nfs_pageio_add_request > > 2,07% [kernel] [k] check_heap_object > > 2,00% [kernel] [k] __slab_free > > > > Can nfs client be adopted to use zero copy ?, for example by using > > io_uring zero copy rx. > > > > The client has no idea in which order the server will return replies to > the RPC calls it sends. So no, it can't queue up those reply buffers in > advance. > > The only way you can avoid memory copies here is to use RDMA to allow > the server to write its replies directly into the correct client read > buffers. > > -- > Trond Myklebust > Linux NFS client maintainer, Hammerspace > trondmy@xxxxxxxxxx, trond.myklebust@xxxxxxxxxxxxxxx