On Fri, 2025-05-09 at 15:03 -0400, cel@xxxxxxxxxx wrote: > From: Chuck Lever <chuck.lever@xxxxxxxxxx> > > There is an upper bound on the number of rdma_rw contexts that can > be created per QP. > > This invisible upper bound is because rdma_create_qp() adds one or > more additional SQEs for each ctxt that the ULP requests via > qp_attr.cap.max_rdma_ctxs. The QP's actual Send Queue length is on > the order of the sum of qp_attr.cap.max_send_wr and a factor times > qp_attr.cap.max_rdma_ctxs. The factor can be up to three, depending > on whether MR operations are required before RDMA Reads. > > This limit is not visible to RDMA consumers via dev->attrs. When the > limit is surpassed, QP creation fails with -ENOMEM. For example: > > svcrdma's estimate of the number of rdma_rw contexts it needs is > three times the number of pages in RPCSVC_MAXPAGES. When MAXPAGES > is about 260, the internally-computed SQ length should be: > > 64 credits + 10 backlog + 3 * (3 * 260) = 2414 > > Which is well below the advertised qp_max_wr of 32768. > > If RPCSVC_MAXPAGES is increased to 4MB, that's 1040 pages: > > 64 credits + 10 backlog + 3 * (3 * 1040) = 9434 > > However, QP creation fails. Dynamic printk for mlx5 shows: > > calc_sq_size:618:(pid 1514): send queue size (9326 * 256 / 64 -> 65536) exceeds limits(32768) > > Although 9326 is still far below qp_max_wr, QP creation still > fails. > > Because the total SQ length calculation is opaque to RDMA consumers, > there doesn't seem to be much that can be done about this except for > consumers to try to keep the requested rdma_rw ctxt count low. > > Fixes: 2da0f610e733 ("svcrdma: Increase the per-transport rw_ctx count") > Reviewed-by: NeilBrown <neil@xxxxxxxxxx> > Reviewed-by: Christoph Hellwig <hch@xxxxxx> > Signed-off-by: Chuck Lever <chuck.lever@xxxxxxxxxx> > --- > net/sunrpc/xprtrdma/svc_rdma_transport.c | 14 ++++++++------ > 1 file changed, 8 insertions(+), 6 deletions(-) > > diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c > index 5940a56023d1..3d7f1413df02 100644 > --- a/net/sunrpc/xprtrdma/svc_rdma_transport.c > +++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c > @@ -406,12 +406,12 @@ static void svc_rdma_xprt_done(struct rpcrdma_notification *rn) > */ > static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt) > { > + unsigned int ctxts, rq_depth, maxpayload; > struct svcxprt_rdma *listen_rdma; > struct svcxprt_rdma *newxprt = NULL; > struct rdma_conn_param conn_param; > struct rpcrdma_connect_private pmsg; > struct ib_qp_init_attr qp_attr; > - unsigned int ctxts, rq_depth; > struct ib_device *dev; > int ret = 0; > RPC_IFDEBUG(struct sockaddr *sap); > @@ -462,12 +462,14 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt) > newxprt->sc_max_bc_requests = 2; > } > > - /* Arbitrarily estimate the number of rw_ctxs needed for > - * this transport. This is enough rw_ctxs to make forward > - * progress even if the client is using one rkey per page > - * in each Read chunk. > + /* Arbitrary estimate of the needed number of rdma_rw contexts. > */ > - ctxts = 3 * RPCSVC_MAXPAGES; > + maxpayload = min(xprt->xpt_server->sv_max_payload, > + RPCSVC_MAXPAYLOAD_RDMA); > + ctxts = newxprt->sc_max_requests * 3 * > + rdma_rw_mr_factor(dev, newxprt->sc_port_num, > + maxpayload >> PAGE_SHIFT); > + > newxprt->sc_sq_depth = rq_depth + ctxts; > if (newxprt->sc_sq_depth > dev->attrs.max_qp_wr) > newxprt->sc_sq_depth = dev->attrs.max_qp_wr; Reviewed-by: Jeff Layton <jlayton@xxxxxxxxxx>