>> On 01/04/25 18:06, Fernand Sieber wrote: >> If a task yields, the scheduler may decide to pick it again. The task in >> turn may decide to yield immediately or shortly after, leading to a tight >> loop of yields. >> >> If there's another runnable task as this point, the deadline will be >> increased by the slice at each loop. This can cause the deadline to runaway >> pretty quickly, and subsequent elevated run delays later on as the task >> doesn't get picked again. The reason the scheduler can pick the same task >> again and again despite its deadline increasing is because it may be the >> only eligible task at that point. >> >> Fix this by updating the deadline only to one slice ahead. >> >> Note, we might want to consider iterating on the implementation of yield as >> follow up: >> * the yielding task could be forfeiting its remaining slice by >> incrementing its vruntime correspondingly >> * in case of yield_to the yielding task could be donating its remaining >> slice to the target task >> >> Signed-off-by: Fernand Sieber <sieberf@xxxxxxxxxx> >IMHO it's worth noting that this is not a theoretical issue. We have >seen this in real life: A KVM virtual machine's vCPU which runs into a >busy guest spin lock calls kvm_vcpu_yield_to() which eventually ends up >in the yield_task_fair() function. We have seen such spin locks due to >guest contention rather than host overcommit, which means we go into a >loop of vCPU execution and spin loop exit, which results in an >undesirable increase in the vCPU thread's deadline. >Given this impacts real workloads and is a bug present since the >introduction of EEVDF, I would say it warrants a >Fixes: 147f3efaa24182 ("sched/fair: Implement an EEVDF-like scheduling >policy") >tag. >Alex Actually, as Alex described, we encountered the same issue in this testing scenario: starting qemu, binding cores to the cpuset group, setting cpuset.cpus=1-3 for stress testing in qemu, running taskset -c 1-3 ./stress-ng -c 20, and then encountering an error where qemu freezes, reporting a soft lockup issue in qemu. After applying this patch, the problem was resolved. Do we have plans to merge this patch into the mainline?