Re: [PATCH v3 00/10] Add a deadline server for sched_ext tasks

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 6/13/2025 1:35 PM, Joel Fernandes wrote:
> 
> 
> On 6/13/2025 1:17 AM, Joel Fernandes wrote:
>> sched_ext tasks currently are starved by RT hoggers especially since RT
>> throttling was replaced by deadline servers to boost only CFS tasks. Several
>> users in the community have reported issues with RT stalling sched_ext tasks.
>> Add a sched_ext deadline server as well so that sched_ext tasks are also
>> boosted and do not suffer starvation.
>>
>> A kselftest is also provided to verify the starvation issues are now fixed.
>>
>> Btw, there is still something funky going on with CPU hotplug and the
>> relinquish patch. Sometimes the sched_ext's hotplug self-test locks up
>> (./runner -t hotplug). Reverting that patch fixes it, so I am suspecting
>> something is off in dl_server_remove_params() when it is being called on
>> offline CPUs.
> 
> I think I got somewhere here with this sched_ext hotplug test but still not
> there yet. Juri, Andrea, Tejun, can you take a look at the below when you get a
> chance?

The following patch makes the sched_ext hotplug test reliably pass for me now.
Thoughts?

From: Joel Fernandes <joelagnelf@xxxxxxxxxx>
Subject: [PATCH] sched/deadline: Prevent setting server as started if params
 couldn't be applied

The following call trace fails to set dl_server_apply_params() as
dl_bw_cpus() is 0 during CPU onlining in the below path.

[   11.878356] ------------[ cut here ]------------
[   11.882592]  <TASK>
[   11.882685]  enqueue_task_scx+0x190/0x280
[   11.882802]  ttwu_do_activate+0xaa/0x2a0
[   11.882925]  try_to_wake_up+0x371/0x600
[   11.883047]  cpuhp_bringup_ap+0xd6/0x170

       [   11.883172]  cpuhp_invoke_callback+0x142/0x540

              [   11.883327]  _cpu_up+0x15b/0x270
[   11.883450]  cpu_up+0x52/0xb0
[   11.883576]  cpu_subsys_online+0x32/0x120
[   11.883704]  online_store+0x98/0x130
[   11.883824]  kernfs_fop_write_iter+0xeb/0x170
[   11.883972]  vfs_write+0x2c7/0x430

       [   11.884091]  ksys_write+0x70/0xe0
[   11.884209]  do_syscall_64+0xd6/0x250
[   11.884327]  ? clear_bhb_loop+0x40/0x90

       [   11.884443]  entry_SYSCALL_64_after_hwframe+0x77/0x7f

It seems too early to start the server. Simply defer the starting of the
server to the next enqueue if dl_server_apply_params() returns an error.
In any case, we should not pretend like the server started and it does
seem to mess up with the sched_ext CPU hotplug test.

With this, the sched_ext hotplug test reliably passes.

Signed-off-by: Joel Fernandes <joelagnelf@xxxxxxxxxx>
---
 kernel/sched/deadline.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index f0cd1dbca4b8..8dd0c6d71489 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -1657,8 +1657,8 @@ void dl_server_start(struct sched_dl_entity *dl_se)
                u64 runtime =  50 * NSEC_PER_MSEC;
                u64 period = 1000 * NSEC_PER_MSEC;

-               dl_server_apply_params(dl_se, runtime, period, 1);
-
+               if (dl_server_apply_params(dl_se, runtime, period, 1))
+                       return;
                dl_se->dl_server = 1;
                dl_se->dl_defer = 1;
                setup_new_dl_entity(dl_se);
@@ -1675,7 +1675,7 @@ void dl_server_start(struct sched_dl_entity *dl_se)

 void dl_server_stop(struct sched_dl_entity *dl_se)
 {
-       if (!dl_se->dl_runtime)
+       if (!dl_se->dl_runtime || !dl_se->dl_server_active)
                return;

        dequeue_dl_entity(dl_se, DEQUEUE_SLEEP);




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux