Re: [PATCH v4 08/14] svcrdma: Adjust the number of entries in svc_rdma_recv_ctxt::rc_pages

Chuck Lever <cel@xxxxxxxxxx> · Tue, 6 May 2025 11:20:44 -0400

On 5/6/25 9:31 AM, Christoph Hellwig wrote:
> On Mon, Apr 28, 2025 at 03:36:56PM -0400, cel@xxxxxxxxxx wrote:
>> From: Chuck Lever <chuck.lever@xxxxxxxxxx>
>>
>> Allow allocation of more entries in the rc_pages[] array when the
>> maximum size of an RPC message is increased.
> 
> Can we maybe also look into a way to not allocate the pages in the
> rqst first just to free them when they get replaced with those from the
> RDMA receive context?  Currently a lot of memory is wasted and pointless
> burden is put on the page allocator when using the RDMA transport on
> the server side.

You're talking about specifically:

1. svcrdma issues RDMA Read WRs from an svc_rqst thread context. It
   pulls the Read sink pages out of the svc_rqst's rq_pages[] array, and
   then svc_alloc_arg() refills the rq_pages[] array before the thread
   returns to the thread pool

2. When the RDMA Read completes, it is picked up by an svc_rqst thread.
   svcrdma frees the pages in the thread's rq_pages[] array, and
   replaces them with the Read's sink pages.

I've looked at this several times over the years. It's a tough problem
to balance against things like preventing a denial of service. For
example, an attempt was made to handle the RDMA Read synchronously in
the same thread that receives the RDMA Receive. Had to be reverted
because if the client is slow to furnish the Read payload, that ties
up the svc_rqst thread. That is a DoS vector.

One idea would be for NFSD to maintain a pool of these pages. But I'm
not convinced that we could invent anything that's less latent than the
generic bulk page allocator: release_pages() and alloc_bulk_pages().

-- 
Chuck Lever