On 01.04.25 14:36, Fernand Sieber wrote:
If a task yields, the scheduler may decide to pick it again. The task in turn may decide to yield immediately or shortly after, leading to a tight loop of yields. If there's another runnable task as this point, the deadline will be increased by the slice at each loop. This can cause the deadline to runaway pretty quickly, and subsequent elevated run delays later on as the task doesn't get picked again. The reason the scheduler can pick the same task again and again despite its deadline increasing is because it may be the only eligible task at that point. Fix this by updating the deadline only to one slice ahead. Note, we might want to consider iterating on the implementation of yield as follow up: * the yielding task could be forfeiting its remaining slice by incrementing its vruntime correspondingly * in case of yield_to the yielding task could be donating its remaining slice to the target task Signed-off-by: Fernand Sieber <sieberf@xxxxxxxxxx>
IMHO it's worth noting that this is not a theoretical issue. We have seen this in real life: A KVM virtual machine's vCPU which runs into a busy guest spin lock calls kvm_vcpu_yield_to() which eventually ends up in the yield_task_fair() function. We have seen such spin locks due to guest contention rather than host overcommit, which means we go into a loop of vCPU execution and spin loop exit, which results in an undesirable increase in the vCPU thread's deadline.
Given this impacts real workloads and is a bug present since the introduction of EEVDF, I would say it warrants a
Fixes: 147f3efaa24182 ("sched/fair: Implement an EEVDF-like scheduling policy")
tag. Alex