Hi Carlo On 30/06/2025 05:13, Carlo Marcelo Arenas Belón wrote:
Since 695605b508 (git-daemon: Simplify dead-children reaping logic, 2008-08-14), the logic to check for zombie children was moved out of the SIGCHLD signal handler, but adding checks for a failed waitpid() were missed, with the possibility that a badly timed signal could prevent the promptly reaping of those defunct processes. After the refactoring of 30e1560230 (daemon: use run-command api for async serving, 2010-11-04), that reproduced that bug, a single process could be skipped from reaping, so prevent that by adding the missing error handling, and while at it make sure that ECHILD (or other errors) are correctly reported as a BUG().
I agree with you analysis, I've left a couple of comments on the fix. I noticed this when I was reading the code to see how well it handled EINTR and decided it wasn't worth worrying about as we still collect the child the next time we call check_dead_children() but there is no harm in checking for EINTR here. It might be worth noting in the commit message that the linux man page for waitpid() explicitly says that EINTR cannot happen when WNOHANG is given though. I wonder if that is the case on other platforms as well because the calling thread is not suspended and EINTR is usually associated with calls that block.
Signed-off-by: Carlo Marcelo Arenas Belón <carenas@xxxxxxxxx> --- daemon.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/daemon.c b/daemon.c index d1be61fd57..16ae66a2da 100644 --- a/daemon.c +++ b/daemon.c @@ -864,8 +864,11 @@ static void check_dead_children(void) live_children--; child_process_clear(&blanket->cld); free(blanket); - } else + } else if (!pid)
Our style guidelines say that if one clause of an if statement needs braces then all the clauses should be braced.
cradle = &blanket->next; + else if (errno != EINTR) + BUG("invalid child '%" PRIuMAX "'", + (uintmax_t)blanket->cld.pid);
POSIX says pid_t is signed so I'm not sure about the unsigned cast here. Do any of the platforms we support have a pid_t that is wider than a long integer? I wondered if we should be logging an error instead of calling BUG() but I think any error other that EINTR indicates a programming error so BUG() seems appropriate.
Thanks Phillip