On Thu, Jun 26, 2025 at 08:33:29AM -0800, Junio C Hamano wrote: > Phillip Wood <phillip.wood123@xxxxxxxxx> writes: > > > Only because you have chosen to use SA_RESTART here. I think it would > > be better to drop that and instead say something like > > > > > > POSIX leaves several aspects of the behavior of signal() up to the > > implementer. It is unspecified whether long running syscalls are > > restarted after an interrupt is received. The event loop in "git > > daemon" assumes that poll() will return with EINTR if a child exits > > while it is polling but if poll() is restarted after an interrupt is > > received that does not happen which can lead to a build up of > > uncollected zombie processes. > > > > It is also unspecified whether the handler is reset after an interrupt > > is received. In order to work correctly on operating systems such as > > Solaris where the handler for SIGCLD is reset when a signal is > > received "git daemon" calls signal() from the SIGCLD handler to > > re-establish a handler for that signal. Unfortunately this causes > > infinite recursion on other operating systems such as AIX. > > > > Both of these problems can be addressed by using sigaction() instead > > of signal() which has much stricter semantics and so, by setting the > > appropriate flags, we can rely on poll() being interrupted and the > > SIGCLD handler not being reset. This change means that all long > > running system calls could fail with EINTR not just pall() but rest of > > the event loop code is designed to gracefully handle that. > > > > After this change there is still a race where a child that exits after > > it has been checked in check_dead_children() but before we call poll() > > will not be collected until a new connection is received or a child > > exits while we're polling. We could fix this by using the "self-pipe > > trick" but do not because .... > > > > > > Then we can drop patches 1 and 4. > > Thanks for a suggestion. I really like the "everybody in these code > paths are prepared to receive and handle EINTR just fine, so we do > not have to do the SA_RESTART" observation very much. Except that is unlikely to be true, as the code has changed a lot on those 20 years (including that it now uses run_command) and that we had been effectively using SA_RESTART under the covers for most of that time because signal() behaviour changed. An obvious bug: (ex: 20250626161038.85966-1-carenas@xxxxxxxxx) I think using SA_RESTART by default might be safer, specially considering that patch 4 already exist and it is not that complicated now that we have 2. Carlo