Re: [PATCH] blk-wbt: Fix io starvation in wbt_rqw_done()

Yu Kuai <yukuai@xxxxxxxxxx> · Fri, 1 Aug 2025 01:12:55 +0800

Hi,

在 2025/7/31 23:40, Yizhou Tang 写道:
Hi Julian,

On Thu, Jul 31, 2025 at 8:33 PM Julian Sun <sunjunchao2870@xxxxxxxxx> wrote:
Recently, we encountered the following hungtask:

INFO: task kworker/11:2:2981147 blocked for more than 6266 seconds
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kworker/11:2    D    0 2981147      2 0x80004000
Workqueue: cgroup_destroy css_free_rwork_fn
Call Trace:
  __schedule+0x934/0xe10
  schedule+0x40/0xb0
  wb_wait_for_completion+0x52/0x80
I don’t see __wbt_wait() or rq_qos_wait() here, so I suspect this call
stack is not directly related to wbt.


  ? finish_wait+0x80/0x80
  mem_cgroup_css_free+0x3a/0x1b0
  css_free_rwork_fn+0x42/0x380
  process_one_work+0x1a2/0x360
  worker_thread+0x30/0x390
  ? create_worker+0x1a0/0x1a0
  kthread+0x110/0x130
  ? __kthread_cancel_work+0x40/0x40
  ret_from_fork+0x1f/0x30
This is writeback cgroup is waiting for writeback to be done, if you 
figured out
they are throttled by wbt, you need to explain clearly, and it's very 
important to
provide evidence to support your analysis. However, the following 
analysis is
a mess :(

This is because the writeback thread has been continuously and repeatedly
throttled by wbt, but at the same time, the writes of another thread
proceed quite smoothly.
After debugging, I believe it is caused by the following reasons.

When thread A is blocked by wbt, the I/O issued by thread B will
use a deeper queue depth(rwb->rq_depth.max_depth) because it
meets the conditions of wb_recent_wait(), thus allowing thread B's
I/O to be issued smoothly and resulting in the inflight I/O of wbt
remaining relatively high.

However, when I/O completes, due to the high inflight I/O of wbt,
the condition "limit - inflight >= rwb->wb_background / 2"
in wbt_rqw_done() cannot be satisfied, causing thread A's I/O
to remain unable to be woken up.
 From your description above, it seems you're suggesting that if A is
throttled by wbt, then a writer B on the same device could
continuously starve A.
This situation is not possible — please refer to rq_qos_wait(): if A
is already sleeping, then when B calls wq_has_sleeper(), it will
detect A’s presence, meaning B will also be throttled.
Yes, there are three rq_wait in wbt, and each one is FIFO. It will be 
possible
if  A is backgroup, and B is swap.

Thanks,
Yi

Some on-site information:

rwb.rq_depth.max_depth
(unsigned int)48
rqw.inflight.counter.value_()
44
rqw.inflight.counter.value_()
35
prog['jiffies'] - rwb.rqos.q.backing_dev_info.last_bdp_sleep
(unsigned long)3
prog['jiffies'] - rwb.rqos.q.backing_dev_info.last_bdp_sleep
(unsigned long)2
prog['jiffies'] - rwb.rqos.q.backing_dev_info.last_bdp_sleep
(unsigned long)20
prog['jiffies'] - rwb.rqos.q.backing_dev_info.last_bdp_sleep
(unsigned long)12

cat wb_normal
24
cat wb_background
12

To fix this issue, we can use max_depth in wbt_rqw_done(), so that
the handling of wb_recent_wait by wbt_rqw_done() and get_limit()
will also be consistent, which is more reasonable.
Are you able to reproduce this problem, and give this patch a test before
you send it?

Thanks,
Kuai

Signed-off-by: Julian Sun <sunjunchao@xxxxxxxxxxxxx>
Fixes: e34cbd307477 ("blk-wbt: add general throttling mechanism")
---
  block/blk-wbt.c | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/block/blk-wbt.c b/block/blk-wbt.c
index a50d4cd55f41..d6a2782d442f 100644
--- a/block/blk-wbt.c
+++ b/block/blk-wbt.c
@@ -210,6 +210,8 @@ static void wbt_rqw_done(struct rq_wb *rwb, struct rq_wait *rqw,
         else if (blk_queue_write_cache(rwb->rqos.disk->queue) &&
                  !wb_recent_wait(rwb))
                 limit = 0;
+       else if (wb_recent_wait(rwb))
+               limit = rwb->rq_depth.max_depth;
         else
                 limit = rwb->wb_normal;

--
2.20.1