Re: [PATCH] ssdd: mitigate tracee starvation

Derek Barbosa <debarbos@xxxxxxxxxx> · Wed, 20 Aug 2025 16:53:38 -0400

On Wed, Aug 20, 2025 at 03:18:17PM -0500, Crystal Wood wrote:
> On Wed, 2025-08-20 at 12:18 -0400, Derek Barbosa wrote:
> > When ssdd is invoked with nforks > 100 && niters == 10000 on a tuned,
> > realtime kernel, the following error messages can be seen:
> > 
> > forktest#4/8719: EXITING, ERROR: wait on PTRACE_SINGLESTEP #385: no SIGCHLD seen (signal count == 0), signo 5
> > forktest#1/8716: EXITING, ERROR: wait on PTRACE_SINGLESTEP #398: no SIGCHLD seen (signal count == 0), signo 5
> > forktest#6/8721: EXITING, ERROR: wait on PTRACE_SINGLESTEP #385: no SIGCHLD seen (signal count == 0), signo 5
> > forktest#10/8725: EXITING, ERROR: wait on PTRACE_SINGLESTEP #388: no SIGCHLD seen (signal count == 0), signo 5
> > forktest#11/8726: EXITING, ERROR: wait on PTRACE_SINGLESTEP #388: no SIGCHLD seen (signal count == 0), signo 5
> > forktest#12/8727: EXITING, ERROR: wait on PTRACE_SINGLESTEP #389: no SIGCHLD seen (signal count == 0), signo 5
> > forktest#14/8729: EXITING, ERROR: wait on PTRACE_SINGLESTEP #389: no SIGCHLD seen (signal count == 0), signo 5
> > forktest#15/8730: EXITING, ERROR: wait on PTRACE_SINGLESTEP #389: no SIGCHLD seen (signal count == 0), signo 5
> > 
> > This behavior is caused by ptrace_stop() being unable to sleep after taking
> > tasklist_lock().
> > 
> > As forktest() generates "niter" PTRACE_SINGLESTEP's for nforks, in the event
> > where nforks >= 100, the sporadic test failures caused by missing SIGCHLDs
> > indicates that the tracees are unable to effectively wait for their asynchronous
> > signals to arrive --as denoted in the previous sleeps for check_sigchld().
> > 
> > Therefore, by performing an addtional sleep() in check_sigchld(), we give the
> > tracee enough CPU time to call do_notify_parent_cldstop()->send_signal_locked().
> > 
> > The observed behavior after appling this patch mitigates the aforementioned
> > issue in scenarios with a high number of nforks.
> > 
> > Suggested-by: Oleg Nesterov <oleg@xxxxxxxxxx>
> > Signed-off-by: Derek Barbosa <debarbos@xxxxxxxxxx>
> > ---
> >  src/ssdd/ssdd.c | 9 +++++++++
> >  1 file changed, 9 insertions(+)
> > 
> > diff --git a/src/ssdd/ssdd.c b/src/ssdd/ssdd.c
> > index 50f7424..7fdb039 100644
> > --- a/src/ssdd/ssdd.c
> > +++ b/src/ssdd/ssdd.c
> > @@ -145,6 +145,15 @@ static int check_sigchld(void)
> >  	for (i = 0; i < 10 && !got_sigchld; i++)
> >  		usleep(16000); /* 160 + 150 = 310 msecs */
> >  
> > +        /*
> > +         * In the _worst case scenario_ where the signal still
> > +         * has not arrived: the tracee is starved or
> > +	 * preempted, and needs more CPU time.
> > +         */
> > +        if(!got_sigchld){
> > +		sleep(1);
> > +	}
> 
> And then down the road we'll hit a load high enough that an extra second
> isn't enough...
> 

FWIW, the defaults listed in the source file have niters set to 10. The above
errors are more consistently drawn out when niters are actually >= 1000.

> How about replacing this whole thing with a call to sigtimedwait()? 
> Especially if the goal is to do the steps "as fast as possible".
> 
> -Crystal
> 

-- 
Derek <debarbos@xxxxxxxxxx>