Hi, On Fri, Aug 1, 2025 at 1:13 AM Yu Kuai <yukuai@xxxxxxxxxx> wrote: > > Hi, > > 在 2025/7/31 23:40, Yizhou Tang 写道: > > Hi Julian, > > > > On Thu, Jul 31, 2025 at 8:33 PM Julian Sun <sunjunchao2870@xxxxxxxxx> wrote: > >> Recently, we encountered the following hungtask: > >> > >> INFO: task kworker/11:2:2981147 blocked for more than 6266 seconds > >> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > >> kworker/11:2 D 0 2981147 2 0x80004000 > >> Workqueue: cgroup_destroy css_free_rwork_fn > >> Call Trace: > >> __schedule+0x934/0xe10 > >> schedule+0x40/0xb0 > >> wb_wait_for_completion+0x52/0x80 > > I don’t see __wbt_wait() or rq_qos_wait() here, so I suspect this call > > stack is not directly related to wbt. > > > > > >> ? finish_wait+0x80/0x80 > >> mem_cgroup_css_free+0x3a/0x1b0 > >> css_free_rwork_fn+0x42/0x380 > >> process_one_work+0x1a2/0x360 > >> worker_thread+0x30/0x390 > >> ? create_worker+0x1a0/0x1a0 > >> kthread+0x110/0x130 > >> ? __kthread_cancel_work+0x40/0x40 > >> ret_from_fork+0x1f/0x30 > This is writeback cgroup is waiting for writeback to be done, if you > figured out > they are throttled by wbt, you need to explain clearly, and it's very > important to > provide evidence to support your analysis. However, the following > analysis is > a mess :( Thanks for the detailed review. Yes, the description is a bit confusing. I will take a more detailed look at the on-site information. > >> > >> This is because the writeback thread has been continuously and repeatedly > >> throttled by wbt, but at the same time, the writes of another thread > >> proceed quite smoothly. > >> After debugging, I believe it is caused by the following reasons. > >> > >> When thread A is blocked by wbt, the I/O issued by thread B will > >> use a deeper queue depth(rwb->rq_depth.max_depth) because it > >> meets the conditions of wb_recent_wait(), thus allowing thread B's > >> I/O to be issued smoothly and resulting in the inflight I/O of wbt > >> remaining relatively high. > >> > >> However, when I/O completes, due to the high inflight I/O of wbt, > >> the condition "limit - inflight >= rwb->wb_background / 2" > >> in wbt_rqw_done() cannot be satisfied, causing thread A's I/O > >> to remain unable to be woken up. > > From your description above, it seems you're suggesting that if A is > > throttled by wbt, then a writer B on the same device could > > continuously starve A. > > This situation is not possible — please refer to rq_qos_wait(): if A > > is already sleeping, then when B calls wq_has_sleeper(), it will > > detect A’s presence, meaning B will also be throttled. > Yes, there are three rq_wait in wbt, and each one is FIFO. It will be > possible > if A is backgroup, and B is swap. > > > > Thanks, > > Yi > > > >> Some on-site information: > >> > >>>>> rwb.rq_depth.max_depth > >> (unsigned int)48 > >>>>> rqw.inflight.counter.value_() > >> 44 > >>>>> rqw.inflight.counter.value_() > >> 35 > >>>>> prog['jiffies'] - rwb.rqos.q.backing_dev_info.last_bdp_sleep > >> (unsigned long)3 > >>>>> prog['jiffies'] - rwb.rqos.q.backing_dev_info.last_bdp_sleep > >> (unsigned long)2 > >>>>> prog['jiffies'] - rwb.rqos.q.backing_dev_info.last_bdp_sleep > >> (unsigned long)20 > >>>>> prog['jiffies'] - rwb.rqos.q.backing_dev_info.last_bdp_sleep > >> (unsigned long)12 > >> > >> cat wb_normal > >> 24 > >> cat wb_background > >> 12 > >> > >> To fix this issue, we can use max_depth in wbt_rqw_done(), so that > >> the handling of wb_recent_wait by wbt_rqw_done() and get_limit() > >> will also be consistent, which is more reasonable. > Are you able to reproduce this problem, and give this patch a test before > you send it? > > Thanks, > Kuai > >> > >> Signed-off-by: Julian Sun <sunjunchao@xxxxxxxxxxxxx> > >> Fixes: e34cbd307477 ("blk-wbt: add general throttling mechanism") > >> --- > >> block/blk-wbt.c | 2 ++ > >> 1 file changed, 2 insertions(+) > >> > >> diff --git a/block/blk-wbt.c b/block/blk-wbt.c > >> index a50d4cd55f41..d6a2782d442f 100644 > >> --- a/block/blk-wbt.c > >> +++ b/block/blk-wbt.c > >> @@ -210,6 +210,8 @@ static void wbt_rqw_done(struct rq_wb *rwb, struct rq_wait *rqw, > >> else if (blk_queue_write_cache(rwb->rqos.disk->queue) && > >> !wb_recent_wait(rwb)) > >> limit = 0; > >> + else if (wb_recent_wait(rwb)) > >> + limit = rwb->rq_depth.max_depth; > >> else > >> limit = rwb->wb_normal; > >> > >> -- > >> 2.20.1 > >> > >> > Thanks, -- Julian Sun <sunjunchao2870@xxxxxxxxx>