On 6/13/2025 1:35 PM, Joel Fernandes wrote: > > > On 6/13/2025 1:17 AM, Joel Fernandes wrote: >> sched_ext tasks currently are starved by RT hoggers especially since RT >> throttling was replaced by deadline servers to boost only CFS tasks. Several >> users in the community have reported issues with RT stalling sched_ext tasks. >> Add a sched_ext deadline server as well so that sched_ext tasks are also >> boosted and do not suffer starvation. >> >> A kselftest is also provided to verify the starvation issues are now fixed. >> >> Btw, there is still something funky going on with CPU hotplug and the >> relinquish patch. Sometimes the sched_ext's hotplug self-test locks up >> (./runner -t hotplug). Reverting that patch fixes it, so I am suspecting >> something is off in dl_server_remove_params() when it is being called on >> offline CPUs. > > I think I got somewhere here with this sched_ext hotplug test but still not > there yet. Juri, Andrea, Tejun, can you take a look at the below when you get a > chance? The following patch makes the sched_ext hotplug test reliably pass for me now. Thoughts? From: Joel Fernandes <joelagnelf@xxxxxxxxxx> Subject: [PATCH] sched/deadline: Prevent setting server as started if params couldn't be applied The following call trace fails to set dl_server_apply_params() as dl_bw_cpus() is 0 during CPU onlining in the below path. [ 11.878356] ------------[ cut here ]------------ [ 11.882592] <TASK> [ 11.882685] enqueue_task_scx+0x190/0x280 [ 11.882802] ttwu_do_activate+0xaa/0x2a0 [ 11.882925] try_to_wake_up+0x371/0x600 [ 11.883047] cpuhp_bringup_ap+0xd6/0x170 [ 11.883172] cpuhp_invoke_callback+0x142/0x540 [ 11.883327] _cpu_up+0x15b/0x270 [ 11.883450] cpu_up+0x52/0xb0 [ 11.883576] cpu_subsys_online+0x32/0x120 [ 11.883704] online_store+0x98/0x130 [ 11.883824] kernfs_fop_write_iter+0xeb/0x170 [ 11.883972] vfs_write+0x2c7/0x430 [ 11.884091] ksys_write+0x70/0xe0 [ 11.884209] do_syscall_64+0xd6/0x250 [ 11.884327] ? clear_bhb_loop+0x40/0x90 [ 11.884443] entry_SYSCALL_64_after_hwframe+0x77/0x7f It seems too early to start the server. Simply defer the starting of the server to the next enqueue if dl_server_apply_params() returns an error. In any case, we should not pretend like the server started and it does seem to mess up with the sched_ext CPU hotplug test. With this, the sched_ext hotplug test reliably passes. Signed-off-by: Joel Fernandes <joelagnelf@xxxxxxxxxx> --- kernel/sched/deadline.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c index f0cd1dbca4b8..8dd0c6d71489 100644 --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -1657,8 +1657,8 @@ void dl_server_start(struct sched_dl_entity *dl_se) u64 runtime = 50 * NSEC_PER_MSEC; u64 period = 1000 * NSEC_PER_MSEC; - dl_server_apply_params(dl_se, runtime, period, 1); - + if (dl_server_apply_params(dl_se, runtime, period, 1)) + return; dl_se->dl_server = 1; dl_se->dl_defer = 1; setup_new_dl_entity(dl_se); @@ -1675,7 +1675,7 @@ void dl_server_start(struct sched_dl_entity *dl_se) void dl_server_stop(struct sched_dl_entity *dl_se) { - if (!dl_se->dl_runtime) + if (!dl_se->dl_runtime || !dl_se->dl_server_active) return; dequeue_dl_entity(dl_se, DEQUEUE_SLEEP);