Phillip Wood <phillip.wood123@xxxxxxxxx> writes: > Only because you have chosen to use SA_RESTART here. I think it would > be better to drop that and instead say something like > > > POSIX leaves several aspects of the behavior of signal() up to the > implementer. It is unspecified whether long running syscalls are > restarted after an interrupt is received. The event loop in "git > daemon" assumes that poll() will return with EINTR if a child exits > while it is polling but if poll() is restarted after an interrupt is > received that does not happen which can lead to a build up of > uncollected zombie processes. > > It is also unspecified whether the handler is reset after an interrupt > is received. In order to work correctly on operating systems such as > Solaris where the handler for SIGCLD is reset when a signal is > received "git daemon" calls signal() from the SIGCLD handler to > re-establish a handler for that signal. Unfortunately this causes > infinite recursion on other operating systems such as AIX. > > Both of these problems can be addressed by using sigaction() instead > of signal() which has much stricter semantics and so, by setting the > appropriate flags, we can rely on poll() being interrupted and the > SIGCLD handler not being reset. This change means that all long > running system calls could fail with EINTR not just pall() but rest of > the event loop code is designed to gracefully handle that. > > After this change there is still a race where a child that exits after > it has been checked in check_dead_children() but before we call poll() > will not be collected until a new connection is received or a child > exits while we're polling. We could fix this by using the "self-pipe > trick" but do not because .... > > > Then we can drop patches 1 and 4. Thanks for a suggestion. I really like the "everybody in these code paths are prepared to receive and handle EINTR just fine, so we do not have to do the SA_RESTART" observation very much.