On Tue, Apr 22, 2025 at 09:28:02AM +0800, Yu Kuai wrote: > Hi, > > 在 2025/04/21 23:22, Keith Busch 写道: > > On Mon, Apr 21, 2025 at 09:53:10AM +0100, Matt Fleming wrote: > > > Hey there, > > > > > > We're moving to 6.12 at Cloudflare and noticed that write await times > > > in iostat are 10x what they were in 6.6. After a bit of bpftracing > > > (script to find all plug times above 10ms below), it seems like this > > > is an accounting error caused by the plug->cur_ktime optimisation > > > rather than anything more material. > > > > > > It appears as though a task can enter __submit_bio() with ->plug set > > > and a very stale cur_ktime value on the order of milliseconds. Is this > > > expected behaviour? It looks like it leads to inaccurate I/O times. > > > > There are places with a block plug that call cond_resched(), which > > doesn't invalidate the plug's cached ktime. You could end up with a > > stale ktime if your process is scheduled out. > > This is wrong, scheduled out will clear cached ktime. You can check > it easily since there are not much caller to clear ktime. Huh? cond_resched() calls __schedule() directly via preempt_schedule_common(), which most certainly does not clear the plug's time. The timestamp is only invalidated from schedule() or rt_mutex_post_schedule(). You can check it ... "easily".