Re: [PATCH] daemon: handle EINTR failures from waitpid()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jun 30, 2025 at 10:00:09AM -0800, Phillip Wood wrote:
> 
> On 30/06/2025 05:13, Carlo Marcelo Arenas Belón wrote:
> > Since 695605b508 (git-daemon: Simplify dead-children reaping logic,
> > 2008-08-14), the logic to check for zombie children was moved out of
> > the SIGCHLD signal handler, but adding checks for a failed waitpid()
> > were missed, with the possibility that a badly timed signal could
> > prevent the promptly reaping of those defunct processes.
> > 
> > After the refactoring of 30e1560230 (daemon: use run-command api for
> > async serving, 2010-11-04), that reproduced that bug, a single
> > process could be skipped from reaping, so prevent that by adding the
> > missing error handling, and while at it make sure that ECHILD (or
> > other errors) are correctly reported as a BUG().
> 
> I agree with you analysis, I've left a couple of comments on the fix. I
> noticed this when I was reading the code to see how well it handled EINTR
> and decided it wasn't worth worrying about as we still collect the child the
> next time we call check_dead_children() but there is no harm in checking for
> EINTR here. It might be worth noting in the commit message that the linux
> man page for waitpid() explicitly says that EINTR cannot happen when WNOHANG
> is given though. I wonder if that is the case on other platforms as well
> because the calling thread is not suspended and EINTR is usually associated
> with calls that block.

I wasn't aware of the comment in the Linux man page, and didn't see
something similar in the ones I checked or the POSIX specification.

If WNOHANG prevents it from returning -1 with errno == EINTR, then my analysis
is incorrect, and the last refactoring is the only one to blame as it didn't
add error handling from ECHILD.

More importantly, if we consider that regardless of the coment in the Linux
man page (google found something similar in the one from zVM) that behaviour
is implementation dependent it might be worth to fix also a similar use case
in run_command.

> >   			cradle = &blanket->next;
> > +		else if (errno != EINTR)
> > +			BUG("invalid child '%" PRIuMAX "'",
> > +			    (uintmax_t)blanket->cld.pid);
> 
> POSIX says pid_t is signed so I'm not sure about the unsigned cast here.

but that is only so that a `(pid_t)-1` is valid AFAIK, and all "real" pid
are expected to be positive (even in systems where pid_t is a 8 byte long
like Solaris).

casting them to unsigned to print them and using a uintmax_t for it was
how all pid are printed since 85e7283069 (cast pid_t's to uintmax_t to
improve portability, 2008-08-31) AFAIK.

> Do
> any of the platforms we support have a pid_t that is wider than a long
> integer?

the ones in AIX are pretty long, but definitely no longer than INT_MAX (with
pid_t being 4 bytes long there).

Carlo




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux