Re: [PATCH v5 01/19] svcrdma: Reduce the number of rdma_rw contexts per-QP

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 2025-05-09 at 15:03 -0400, cel@xxxxxxxxxx wrote:
> From: Chuck Lever <chuck.lever@xxxxxxxxxx>
> 
> There is an upper bound on the number of rdma_rw contexts that can
> be created per QP.
> 
> This invisible upper bound is because rdma_create_qp() adds one or
> more additional SQEs for each ctxt that the ULP requests via
> qp_attr.cap.max_rdma_ctxs. The QP's actual Send Queue length is on
> the order of the sum of qp_attr.cap.max_send_wr and a factor times
> qp_attr.cap.max_rdma_ctxs. The factor can be up to three, depending
> on whether MR operations are required before RDMA Reads.
> 
> This limit is not visible to RDMA consumers via dev->attrs. When the
> limit is surpassed, QP creation fails with -ENOMEM. For example:
> 
> svcrdma's estimate of the number of rdma_rw contexts it needs is
> three times the number of pages in RPCSVC_MAXPAGES. When MAXPAGES
> is about 260, the internally-computed SQ length should be:
> 
> 64 credits + 10 backlog + 3 * (3 * 260) = 2414
> 
> Which is well below the advertised qp_max_wr of 32768.
> 
> If RPCSVC_MAXPAGES is increased to 4MB, that's 1040 pages:
> 
> 64 credits + 10 backlog + 3 * (3 * 1040) = 9434
> 
> However, QP creation fails. Dynamic printk for mlx5 shows:
> 
> calc_sq_size:618:(pid 1514): send queue size (9326 * 256 / 64 -> 65536) exceeds limits(32768)
> 
> Although 9326 is still far below qp_max_wr, QP creation still
> fails.
> 
> Because the total SQ length calculation is opaque to RDMA consumers,
> there doesn't seem to be much that can be done about this except for
> consumers to try to keep the requested rdma_rw ctxt count low.
> 
> Fixes: 2da0f610e733 ("svcrdma: Increase the per-transport rw_ctx count")
> Reviewed-by: NeilBrown <neil@xxxxxxxxxx>
> Reviewed-by: Christoph Hellwig <hch@xxxxxx>
> Signed-off-by: Chuck Lever <chuck.lever@xxxxxxxxxx>
> ---
>  net/sunrpc/xprtrdma/svc_rdma_transport.c | 14 ++++++++------
>  1 file changed, 8 insertions(+), 6 deletions(-)
> 
> diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
> index 5940a56023d1..3d7f1413df02 100644
> --- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
> +++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
> @@ -406,12 +406,12 @@ static void svc_rdma_xprt_done(struct rpcrdma_notification *rn)
>   */
>  static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
>  {
> +	unsigned int ctxts, rq_depth, maxpayload;
>  	struct svcxprt_rdma *listen_rdma;
>  	struct svcxprt_rdma *newxprt = NULL;
>  	struct rdma_conn_param conn_param;
>  	struct rpcrdma_connect_private pmsg;
>  	struct ib_qp_init_attr qp_attr;
> -	unsigned int ctxts, rq_depth;
>  	struct ib_device *dev;
>  	int ret = 0;
>  	RPC_IFDEBUG(struct sockaddr *sap);
> @@ -462,12 +462,14 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
>  		newxprt->sc_max_bc_requests = 2;
>  	}
>  
> -	/* Arbitrarily estimate the number of rw_ctxs needed for
> -	 * this transport. This is enough rw_ctxs to make forward
> -	 * progress even if the client is using one rkey per page
> -	 * in each Read chunk.
> +	/* Arbitrary estimate of the needed number of rdma_rw contexts.
>  	 */
> -	ctxts = 3 * RPCSVC_MAXPAGES;
> +	maxpayload = min(xprt->xpt_server->sv_max_payload,
> +			 RPCSVC_MAXPAYLOAD_RDMA);
> +	ctxts = newxprt->sc_max_requests * 3 *
> +		rdma_rw_mr_factor(dev, newxprt->sc_port_num,
> +				  maxpayload >> PAGE_SHIFT);
> +
>  	newxprt->sc_sq_depth = rq_depth + ctxts;
>  	if (newxprt->sc_sq_depth > dev->attrs.max_qp_wr)
>  		newxprt->sc_sq_depth = dev->attrs.max_qp_wr;

Reviewed-by: Jeff Layton <jlayton@xxxxxxxxxx>





[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux