On Tue, 2025-07-22 at 22:01 +0300, Anton Gavriliuk wrote: > > The only way you can avoid memory copies here is to use RDMA to > > allow > > the server to write its replies directly into the correct client > > read > > buffers. > > I remounted with rdma > > [root@23-127-77-6 ~]# mount -t nfs -o > proto=rdma,nconnect=16,rsize=4194304,wsize=4194304 192.168.0.7:/mnt > /mnt > [root@23-127-77-6 ~]# mount -v|grep -i rdma > 192.168.0.7:/mnt on /mnt type nfs4 > (rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,fat > al_neterrors=none,proto=rdma,nconnect=16,port=20049,timeo=600,retrans > =2,sec=sys,clientaddr=192.168.0.8,local_lock=none,addr=192.168.0.7) > [root@23-127-77-6 ~]# > > and repeat sequential read. > > According to perf top, memcpy is gone, > > Samples: 64K of event 'cycles:P', 4000 Hz, Event count (approx.): > 22510217633 lost: 0/0 drop: 0/0 > Overhead Shared Object Symbol > 13,12% [nfs] [k] nfs_generic_pg_test > 11,32% [nfs] [k] nfs_page_group_lock > 10,42% [nfs] [k] nfs_clear_request > 5,41% [kernel] [k] gup_fast_pte_range > 4,11% [nfs] [k] > nfs_page_group_sync_on_bit > 3,36% [nfs] [k] nfs_page_create > 3,13% [nfs] [k] > __nfs_pageio_add_request > 2,10% [nfs] [k] > __nfs_find_lock_context > > but it didn't improve read bandwidth at all. Even slightly worse > compared to proto=tcp. So that more or less proves that those memcpys were never the root cause of your performance problem. I suspect you'll want to look at the server performance. Maybe also look at the client tunables that limit concurrency, such as the sunrpc.rdma_slot_table_entries sysctl, or the nfs.max_session_slots module parameter, etc. > > Anton > > вт, 22 июл. 2025 г. в 21:43, Trond Myklebust <trondmy@xxxxxxxxxx>: > > > > On Tue, 2025-07-22 at 21:10 +0300, Anton Gavriliuk wrote: > > > Hi > > > > > > I am trying to exceed 20 GB/s doing sequential read from a single > > > file > > > on the nfs client. > > > > > > perf top shows excessive memcpy usage: > > > > > > Samples: 237K of event 'cycles:P', 4000 Hz, Event count > > > (approx.): > > > 120872739112 lost: 0/0 drop: 0/0 > > > Overhead Shared Object Symbol > > > 20,54% [kernel] [k] memcpy > > > 6,52% [nfs] [k] > > > nfs_generic_pg_test > > > 5,12% [nfs] [k] > > > nfs_page_group_lock > > > 4,92% [kernel] [k] _copy_to_iter > > > 4,79% [kernel] [k] gro_list_prepare > > > 2,77% [nfs] [k] > > > nfs_clear_request > > > 2,10% [nfs] [k] > > > __nfs_pageio_add_request > > > 2,07% [kernel] [k] > > > check_heap_object > > > 2,00% [kernel] [k] __slab_free > > > > > > Can nfs client be adopted to use zero copy ?, for example by > > > using > > > io_uring zero copy rx. > > > > > > > The client has no idea in which order the server will return > > replies to > > the RPC calls it sends. So no, it can't queue up those reply > > buffers in > > advance. > > > > The only way you can avoid memory copies here is to use RDMA to > > allow > > the server to write its replies directly into the correct client > > read > > buffers. > > > > -- > > Trond Myklebust > > Linux NFS client maintainer, Hammerspace > > trondmy@xxxxxxxxxx, trond.myklebust@xxxxxxxxxxxxxxx