On 5/6/25 9:31 AM, Christoph Hellwig wrote: > On Mon, Apr 28, 2025 at 03:36:56PM -0400, cel@xxxxxxxxxx wrote: >> From: Chuck Lever <chuck.lever@xxxxxxxxxx> >> >> Allow allocation of more entries in the rc_pages[] array when the >> maximum size of an RPC message is increased. > > Can we maybe also look into a way to not allocate the pages in the > rqst first just to free them when they get replaced with those from the > RDMA receive context? Currently a lot of memory is wasted and pointless > burden is put on the page allocator when using the RDMA transport on > the server side. You're talking about specifically: 1. svcrdma issues RDMA Read WRs from an svc_rqst thread context. It pulls the Read sink pages out of the svc_rqst's rq_pages[] array, and then svc_alloc_arg() refills the rq_pages[] array before the thread returns to the thread pool 2. When the RDMA Read completes, it is picked up by an svc_rqst thread. svcrdma frees the pages in the thread's rq_pages[] array, and replaces them with the Read's sink pages. I've looked at this several times over the years. It's a tough problem to balance against things like preventing a denial of service. For example, an attempt was made to handle the RDMA Read synchronously in the same thread that receives the RDMA Receive. Had to be reverted because if the client is slow to furnish the Read payload, that ties up the svc_rqst thread. That is a DoS vector. One idea would be for NFSD to maintain a pool of these pages. But I'm not convinced that we could invent anything that's less latent than the generic bulk page allocator: release_pages() and alloc_bulk_pages(). -- Chuck Lever