On 2025-09-05 22:22:48, Dust Li wrote: >On 2025-09-05 14:01:35, Halil Pasic wrote: >>On Fri, 5 Sep 2025 11:00:59 +0200 >>Halil Pasic <pasic@xxxxxxxxxxxxx> wrote: >> >>> > 1. What if the two sides have different max_send_wr/max_recv_wr configurations? >>> > IIUC, For example, if the client sets max_send_wr to 64, but the server sets >>> > max_recv_wr to 16, the client might overflow the server's QP receive >>> > queue, potentially causing an RNR (Receiver Not Ready) error. >>> >>> I don't think the 16 is spec-ed anywhere and if the client and the server >>> need to agree on the same value it should either be speced, or a >>> protocol mechanism for negotiating it needs to exist. So what is your >>> take on this as an SMC maintainer? > >Right — I didn't realize that either until I saw this patch today :) >But since the implementation's been set to 16 since day one, bumping it >up might break things. > >>> >>> I think, we have tested heterogeneous setups and didn't see any grave >>> issues. But let me please do a follow up on this. Maybe the other >>> maintainers can chime in as well. > >I'm glad to hear from others. > >> >>Did some research and some thinking. Are you concerned about a >>performance regression for e.g. 64 -> 16 compared to 16 -> 16? According >>to my current understanding the RNR must not lead to a catastrophic >>failure, but the RDMA/IB stack is supposed to handle that. > >No, it's not just a performance regression. >If we get an RNR when going from 64 -> 16, the whole link group gets >torn down — and all SMC connections inside it break. >So from the user’s point of view, connections will just randomly drop >out of nowhere. I double-checked the code and noticed we set qp_attr.rnr_retry = SMC_QP_RNR_RETRY = 7, which means "infinite retries." So the QP will just keep retrying — we won't actually get an RNR. That said, yeah, just performance regression. So in this case, I would regard it as acceptable. We can go with this. Best regards, Dust