Re: [PATCH net-next 1/2] net/smc: make wr buffer count configurable

Dust Li <dust.li@xxxxxxxxxxxxxxxxx> · Fri, 5 Sep 2025 22:51:37 +0800

On 2025-09-05 22:22:48, Dust Li wrote:
>On 2025-09-05 14:01:35, Halil Pasic wrote:
>>On Fri, 5 Sep 2025 11:00:59 +0200
>>Halil Pasic <pasic@xxxxxxxxxxxxx> wrote:
>>
>>> > 1. What if the two sides have different max_send_wr/max_recv_wr configurations?
>>> > IIUC, For example, if the client sets max_send_wr to 64, but the server sets
>>> > max_recv_wr to 16, the client might overflow the server's QP receive
>>> > queue, potentially causing an RNR (Receiver Not Ready) error.  
>>>
>>> I don't think the 16 is spec-ed anywhere and if the client and the server
>>> need to agree on the same value it should either be speced, or a
>>> protocol mechanism for negotiating it needs to exist. So what is your
>>> take on this as an SMC maintainer?
>
>Right — I didn't realize that either until I saw this patch today :)
>But since the implementation's been set to 16 since day one, bumping it
>up might break things.
>
>>>
>>> I think, we have tested heterogeneous setups and didn't see any grave
>>> issues. But let me please do a follow up on this. Maybe the other
>>> maintainers can chime in as well.
>
>I'm glad to hear from others.
>
>>
>>Did some research and some thinking. Are you concerned about a
>>performance regression for e.g. 64 -> 16 compared to 16 -> 16? According
>>to my current understanding the RNR must not lead to a catastrophic
>>failure, but the RDMA/IB stack is supposed to handle that.
>
>No, it's not just a performance regression.
>If we get an RNR when going from 64 -> 16, the whole link group gets
>torn down — and all SMC connections inside it break.
>So from the user’s point of view, connections will just randomly drop
>out of nowhere.

I double-checked the code and noticed we set qp_attr.rnr_retry =
SMC_QP_RNR_RETRY = 7, which means "infinite retries."
So the QP will just keep retrying — we won't actually get an RNR.
That said, yeah, just performance regression.

So in this case, I would regard it as acceptable. We can go with this.

Best regards,
Dust