Re: [PATCH for-rc v1] RDMA/rxe: Avoid CQ polling hang triggered by CQ resize

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2025/08/18 13:44, Zhu Yanjun wrote:
在 2025/8/17 5:37, Daisuke Matsuda 写道:
When running the test_resize_cq testcase from rdma-core, polling a
completion queue from userspace may occasionally hang and eventually fail
with a timeout:
=====
ERROR: test_resize_cq (tests.test_cq.CQTest.test_resize_cq)
Test resize CQ, start with specific value and then increase and decrease
----------------------------------------------------------------------
Traceback (most recent call last):
     File "/root/deb/rdma-core/tests/test_cq.py", line 135, in test_resize_cq
       u.poll_cq(self.client.cq)
     File "/root/deb/rdma-core/tests/utils.py", line 687, in poll_cq
       wcs = _poll_cq(cq, count, data)
             ^^^^^^^^^^^^^^^^^^^^^^^^^
     File "/root/deb/rdma-core/tests/utils.py", line 669, in _poll_cq
       raise PyverbsError(f'Got timeout on polling ({count} CQEs remaining)')
pyverbs.pyverbs_error.PyverbsError: Got timeout on polling (1 CQEs
remaining)
=====

The issue is caused when rxe_cq_post() fails to post a CQE due to the queue
being temporarily full, and the CQE is effectively lost. To mitigate this,
add a bounded busy-wait with fallback rescheduling so that CQE does not get
lost.

Signed-off-by: Daisuke Matsuda <dskmtsd@xxxxxxxxx>
---
  drivers/infiniband/sw/rxe/rxe_cq.c | 27 +++++++++++++++++++++++++--
  1 file changed, 25 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_cq.c b/drivers/infiniband/sw/rxe/rxe_cq.c
index fffd144d509e..7b0fba63204e 100644
--- a/drivers/infiniband/sw/rxe/rxe_cq.c
+++ b/drivers/infiniband/sw/rxe/rxe_cq.c
@@ -84,14 +84,36 @@ int rxe_cq_resize_queue(struct rxe_cq *cq, int cqe,
  /* caller holds reference to cq */
  int rxe_cq_post(struct rxe_cq *cq, struct rxe_cqe *cqe, int solicited)
  {
+    unsigned long flags;
+    u32 spin_cnt = 3000;
      struct ib_event ev;
-    int full;
      void *addr;
-    unsigned long flags;
+    int full;
      spin_lock_irqsave(&cq->cq_lock, flags);
      full = queue_full(cq->queue, QUEUE_TYPE_TO_CLIENT);
+    if (likely(!full))
+        goto post_queue;
+
+    /* constant backoff until queue is ready */
+    while (spin_cnt--) {
+        full = queue_full(cq->queue, QUEUE_TYPE_TO_CLIENT);
+        if (!full)
+            goto post_queue;
+
+        cpu_relax();
+    }

The loop runs 3000 times.
Each iteration:

Checks queue_full()
Executes cpu_relax()

On modern CPUs, each iteration may take a few cycles, e.g., 4–10 cycles per iteration (depends on memory/cache).

Suppose 1 cycle = ~0.3 ns on a 3 GHz CPU, 10 cycles ≈ 3 ns
3000 iterations × 10 cycles ≈ 30,000 cycles

30000 cycles * 0.3 ns = 9000 ns = 9 microseconds

So the “critical section” while spinning is tens of microseconds, not milliseconds.

I was concerned that 3000 iterations might make the spin lock critical section too long, but based on the analysis above, it appears that this is still a short-duration critical section.

Thank you for the review.

Assuming the two loads in queue_full() hit in the L1 cache, I estimate each iteration could take around
15–20 cycles. Based on your calculation, the maximum total time would be approximately 18 microseconds.


I am not sure if it is a big spin lock critical section or not.
If it is not,

In my opinion, this duration is acceptable, as the thread does not actually spin for that long
in practice. During my testing, it never reached the cond_resched() fallback, so the
current spin count appears sufficient to avoid the failure case.

Thanks,
Daisuke


Reviewed-by: Zhu Yanjun <yanjun.zhu@xxxxxxxxx>

Zhu Yanjun

+
+    /* try giving up cpu and retry */
+    if (full) {
+        spin_unlock_irqrestore(&cq->cq_lock, flags);
+        cond_resched();
+        spin_lock_irqsave(&cq->cq_lock, flags);
+
+        full = queue_full(cq->queue, QUEUE_TYPE_TO_CLIENT);
+    }
+
      if (unlikely(full)) {
          rxe_err_cq(cq, "queue full\n");
          spin_unlock_irqrestore(&cq->cq_lock, flags);
@@ -105,6 +127,7 @@ int rxe_cq_post(struct rxe_cq *cq, struct rxe_cqe *cqe, int solicited)
          return -EBUSY;
      }
+ post_queue:
      addr = queue_producer_addr(cq->queue, QUEUE_TYPE_TO_CLIENT);
      memcpy(addr, cqe, sizeof(*cqe));






[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux