Hi Joel, On Fri, Jun 13, 2025 at 02:05:03PM -0400, Joel Fernandes wrote: > > > On 6/13/2025 1:35 PM, Joel Fernandes wrote: > > > > > > On 6/13/2025 1:17 AM, Joel Fernandes wrote: > >> sched_ext tasks currently are starved by RT hoggers especially since RT > >> throttling was replaced by deadline servers to boost only CFS tasks. Several > >> users in the community have reported issues with RT stalling sched_ext tasks. > >> Add a sched_ext deadline server as well so that sched_ext tasks are also > >> boosted and do not suffer starvation. > >> > >> A kselftest is also provided to verify the starvation issues are now fixed. > >> > >> Btw, there is still something funky going on with CPU hotplug and the > >> relinquish patch. Sometimes the sched_ext's hotplug self-test locks up > >> (./runner -t hotplug). Reverting that patch fixes it, so I am suspecting > >> something is off in dl_server_remove_params() when it is being called on > >> offline CPUs. > > > > I think I got somewhere here with this sched_ext hotplug test but still not > > there yet. Juri, Andrea, Tejun, can you take a look at the below when you get a > > chance? > > The following patch makes the sched_ext hotplug test reliably pass for me now. > Thoughts? For me it gets stuck here, when the hotplug test tries to bring the CPU offline: TEST: hotplug DESCRIPTION: Verify hotplug behavior OUTPUT: [ 5.042497] smpboot: CPU 1 is now offline [ 5.069691] sched_ext: BPF scheduler "hotplug_cbs" enabled [ 5.108705] smpboot: Booting Node 0 Processor 1 APIC 0x1 [ 5.149484] sched_ext: BPF scheduler "hotplug_cbs" disabled (unregistered from BPF) EXIT: unregistered from BPF (hotplug event detected (1 going online)) [ 5.204500] sched_ext: BPF scheduler "hotplug_cbs" enabled Failed to bring CPU offline (Device or resource busy) However, if I don't stop rq->fair_server in the scx_switching_all case everything seems to work (which I still don't understand why). I didn't have much time to look at this today, I'll investigate more tomorrow. -Andrea > > From: Joel Fernandes <joelagnelf@xxxxxxxxxx> > Subject: [PATCH] sched/deadline: Prevent setting server as started if params > couldn't be applied > > The following call trace fails to set dl_server_apply_params() as > dl_bw_cpus() is 0 during CPU onlining in the below path. > > [ 11.878356] ------------[ cut here ]------------ > [ 11.882592] <TASK> > [ 11.882685] enqueue_task_scx+0x190/0x280 > [ 11.882802] ttwu_do_activate+0xaa/0x2a0 > [ 11.882925] try_to_wake_up+0x371/0x600 > [ 11.883047] cpuhp_bringup_ap+0xd6/0x170 > > [ 11.883172] cpuhp_invoke_callback+0x142/0x540 > > [ 11.883327] _cpu_up+0x15b/0x270 > [ 11.883450] cpu_up+0x52/0xb0 > [ 11.883576] cpu_subsys_online+0x32/0x120 > [ 11.883704] online_store+0x98/0x130 > [ 11.883824] kernfs_fop_write_iter+0xeb/0x170 > [ 11.883972] vfs_write+0x2c7/0x430 > > [ 11.884091] ksys_write+0x70/0xe0 > [ 11.884209] do_syscall_64+0xd6/0x250 > [ 11.884327] ? clear_bhb_loop+0x40/0x90 > > [ 11.884443] entry_SYSCALL_64_after_hwframe+0x77/0x7f > > It seems too early to start the server. Simply defer the starting of the > server to the next enqueue if dl_server_apply_params() returns an error. > In any case, we should not pretend like the server started and it does > seem to mess up with the sched_ext CPU hotplug test. > > With this, the sched_ext hotplug test reliably passes. > > Signed-off-by: Joel Fernandes <joelagnelf@xxxxxxxxxx> > --- > kernel/sched/deadline.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c > index f0cd1dbca4b8..8dd0c6d71489 100644 > --- a/kernel/sched/deadline.c > +++ b/kernel/sched/deadline.c > @@ -1657,8 +1657,8 @@ void dl_server_start(struct sched_dl_entity *dl_se) > u64 runtime = 50 * NSEC_PER_MSEC; > u64 period = 1000 * NSEC_PER_MSEC; > > - dl_server_apply_params(dl_se, runtime, period, 1); > - > + if (dl_server_apply_params(dl_se, runtime, period, 1)) > + return; > dl_se->dl_server = 1; > dl_se->dl_defer = 1; > setup_new_dl_entity(dl_se); > @@ -1675,7 +1675,7 @@ void dl_server_start(struct sched_dl_entity *dl_se) > > void dl_server_stop(struct sched_dl_entity *dl_se) > { > - if (!dl_se->dl_runtime) > + if (!dl_se->dl_runtime || !dl_se->dl_server_active) > return; > > dequeue_dl_entity(dl_se, DEQUEUE_SLEEP);