sched_ext tasks can be starved by long-running RT tasks, especially since RT throttling was replaced by deadline servers to boost only SCHED_NORMAL tasks. Several users in the community have reported issues with RT stalling sched_ext tasks. This is fairly common on distributions or environments where applications like video compositors, audio services, etc. run as RT tasks by default. Example trace (showing a per-CPU kthread stalled due to the sway Wayland compositor running as an RT task): runnable task stall (kworker/0:0[106377] failed to run for 5.043s) ... CPU 0 : nr_run=3 flags=0xd cpu_rel=0 ops_qseq=20646200 pnt_seq=45388738 curr=sway[994] class=rt_sched_class R kworker/0:0[106377] -5043ms scx_state/flags=3/0x1 dsq_flags=0x0 ops_state/qseq=0/0 sticky/holding_cpu=-1/-1 dsq_id=0x8000000000000002 dsq_vtime=0 slice=20000000 cpus=01 This is often perceived as a bug in the BPF schedulers, but in reality they can't do much: RT tasks run outside their control and can potentially consume 100% of the CPU bandwidth. Fix this by adding a sched_ext deadline server as well so that sched_ext tasks are also boosted and do not suffer starvation. Two kselftests are also provided to verify the starvation fixes and bandwidth allocation is correct. This patchset is also available in the following git branch: git://git.kernel.org/pub/scm/linux/kernel/git/arighi/linux.git scx-dl-server Changes in v8: - Add tj's patch to de-couple balance and pick_task and avoid changing sched/core callbacks to propagate @rf - Simplify dl_se->dl_server check (suggested by PeterZ) - Small coding style fixes in the kselftests - Link to v7: https://lore.kernel.org/all/20250809184800.129831-1-joelagnelf@xxxxxxxxxx/ Changes in v7: - Rebased to Linus master - Link to v6: https://lore.kernel.org/all/20250702232944.3221001-1-joelagnelf@xxxxxxxxxx/ Changes in v6: - Added Acks to few patches - Fixes to few nits suggested by Tejun - Link to v5: https://lore.kernel.org/all/20250620203234.3349930-1-joelagnelf@xxxxxxxxxx/ Changes in v5: - Added a kselftest (total_bw) to sched_ext to verify bandwidth values from debugfs - Address comment from Andrea about redundant rq clock invalidation - Link to v4: https://lore.kernel.org/all/20250617200523.1261231-1-joelagnelf@xxxxxxxxxx/ Changes in v4: - Fixed issues with hotplugged CPUs having their DL server bandwidth altered due to loading SCX - Fixed other issues - Rebased on Linus master - All sched_ext kselftests reliably pass now, also verified that the total_bw in debugfs (CONFIG_SCHED_DEBUG) is conserved with these patches - Link to v3: https://lore.kernel.org/all/20250613051734.4023260-1-joelagnelf@xxxxxxxxxx/ Changes in v3: - Removed code duplication in debugfs. Made ext interface separate - Fixed issue where rq_lock_irqsave was not used in the relinquish patch - Fixed running bw accounting issue in dl_server_remove_params - Link to v2: https://lore.kernel.org/all/20250602180110.816225-1-joelagnelf@xxxxxxxxxx/ Changes in v2: - Fixed a hang related to using rq_lock instead of rq_lock_irqsave - Added support to remove BW of DL servers when they are switched to/from EXT - Link to v1: https://lore.kernel.org/all/20250315022158.2354454-1-joelagnelf@xxxxxxxxxx/ Andrea Righi (6): sched_ext: Exit early on hotplug events during attach sched/deadline: Add support to remove DL server's bandwidth contribution sched/deadline: Account ext server bandwidth sched/deadline: Allow to initialize DL server when needed sched_ext: Selectively enable ext and fair DL servers selftests/sched_ext: Add test for sched_ext dl_server Joel Fernandes (9): sched/debug: Fix updating of ppos on server write ops sched/debug: Stop and start server based on if it was active sched/deadline: Clear the defer params sched/deadline: Return EBUSY if dl_bw_cpus is zero sched: Add a server arg to dl_server_update_idle_time() sched_ext: Add a DL server for sched_ext tasks sched/debug: Add support to change sched_ext server params sched/deadline: Fix DL server crash in inactive_timer callback selftests/sched_ext: Add test for DL server total_bw consistency Tejun Heo (1): sched/deadline: De-couple balance and pick_task include/linux/sched.h | 2 + kernel/sched/core.c | 17 +- kernel/sched/deadline.c | 152 +++++++++--- kernel/sched/debug.c | 161 ++++++++++--- kernel/sched/ext.c | 175 ++++++++++++-- kernel/sched/fair.c | 4 +- kernel/sched/idle.c | 2 +- kernel/sched/sched.h | 15 +- kernel/sched/topology.c | 5 + tools/testing/selftests/sched_ext/Makefile | 2 + tools/testing/selftests/sched_ext/rt_stall.bpf.c | 23 ++ tools/testing/selftests/sched_ext/rt_stall.c | 214 +++++++++++++++++ tools/testing/selftests/sched_ext/total_bw.c | 281 +++++++++++++++++++++++ 13 files changed, 968 insertions(+), 85 deletions(-) create mode 100644 tools/testing/selftests/sched_ext/rt_stall.bpf.c create mode 100644 tools/testing/selftests/sched_ext/rt_stall.c create mode 100644 tools/testing/selftests/sched_ext/total_bw.c