Re: [PATCH] ssdd: mitigate tracee starvation

Crystal Wood <crwood@xxxxxxxxxx> · Wed, 20 Aug 2025 15:18:17 -0500

On Wed, 2025-08-20 at 12:18 -0400, Derek Barbosa wrote:
> When ssdd is invoked with nforks > 100 && niters == 10000 on a tuned,
> realtime kernel, the following error messages can be seen:
> 
> forktest#4/8719: EXITING, ERROR: wait on PTRACE_SINGLESTEP #385: no SIGCHLD seen (signal count == 0), signo 5
> forktest#1/8716: EXITING, ERROR: wait on PTRACE_SINGLESTEP #398: no SIGCHLD seen (signal count == 0), signo 5
> forktest#6/8721: EXITING, ERROR: wait on PTRACE_SINGLESTEP #385: no SIGCHLD seen (signal count == 0), signo 5
> forktest#10/8725: EXITING, ERROR: wait on PTRACE_SINGLESTEP #388: no SIGCHLD seen (signal count == 0), signo 5
> forktest#11/8726: EXITING, ERROR: wait on PTRACE_SINGLESTEP #388: no SIGCHLD seen (signal count == 0), signo 5
> forktest#12/8727: EXITING, ERROR: wait on PTRACE_SINGLESTEP #389: no SIGCHLD seen (signal count == 0), signo 5
> forktest#14/8729: EXITING, ERROR: wait on PTRACE_SINGLESTEP #389: no SIGCHLD seen (signal count == 0), signo 5
> forktest#15/8730: EXITING, ERROR: wait on PTRACE_SINGLESTEP #389: no SIGCHLD seen (signal count == 0), signo 5
> 
> This behavior is caused by ptrace_stop() being unable to sleep after taking
> tasklist_lock().
> 
> As forktest() generates "niter" PTRACE_SINGLESTEP's for nforks, in the event
> where nforks >= 100, the sporadic test failures caused by missing SIGCHLDs
> indicates that the tracees are unable to effectively wait for their asynchronous
> signals to arrive --as denoted in the previous sleeps for check_sigchld().
> 
> Therefore, by performing an addtional sleep() in check_sigchld(), we give the
> tracee enough CPU time to call do_notify_parent_cldstop()->send_signal_locked().
> 
> The observed behavior after appling this patch mitigates the aforementioned
> issue in scenarios with a high number of nforks.
> 
> Suggested-by: Oleg Nesterov <oleg@xxxxxxxxxx>
> Signed-off-by: Derek Barbosa <debarbos@xxxxxxxxxx>
> ---
>  src/ssdd/ssdd.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/src/ssdd/ssdd.c b/src/ssdd/ssdd.c
> index 50f7424..7fdb039 100644
> --- a/src/ssdd/ssdd.c
> +++ b/src/ssdd/ssdd.c
> @@ -145,6 +145,15 @@ static int check_sigchld(void)
>  	for (i = 0; i < 10 && !got_sigchld; i++)
>  		usleep(16000); /* 160 + 150 = 310 msecs */
>  
> +        /*
> +         * In the _worst case scenario_ where the signal still
> +         * has not arrived: the tracee is starved or
> +	 * preempted, and needs more CPU time.
> +         */
> +        if(!got_sigchld){
> +		sleep(1);
> +	}

And then down the road we'll hit a load high enough that an extra second
isn't enough...

How about replacing this whole thing with a call to sigtimedwait()? 
Especially if the goal is to do the steps "as fast as possible".

-Crystal