Re: [BUG] deadlocks due to setenv in lookup_program.c

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 11/9/25 21:22, Jun Eeo wrote:
Hi,

When using the lookup_program module, we've seen an issue where the
main automount process waits forever for the forked child to produce
output (and quit). This causes whatever processes that require the
automount to wait on autofs_wait.

Coredump of the forked child showed that it was stuck waiting on a
lock when calling setenv (it never went so far as doing the execl):

#0  __lll_lock_wait_private () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:63
#1  0x00007ffff60505ce in __add_to_environ (name=0x55555558b2e0 "autodir", value=0x55555558b300 "/a",
     combined=0x0, replace=1) at setenv.c:133

This seems to be caused by the macro_setenv call in lookup_one in
modules/lookup_program.c -- that calls into glibc's setenv, and
depending on when exactly the process was forked, it could be looking
at a locked envlock.

I guess there are two approaches which to fixing the issue and I'd
like to get some thoughts before sending a patch:

1. Not calling out to any environment-mutating functions (clearenv,
    putenv, setenv) which can hold the envlock after the initial setup.
    In our deployment, the only place this happens is macro_setenv
    and the sd_notify call in daemon/automount.c.

2. Avoiding the use of setenv in the forked child (it is MT-Unsafe
    anyway). We can copy environ and use something like execle.

Certain macros are required to implement the Sun map functionality

so not using environment setting functions isn't really ok I think.


Sounds like not using env modifying calls in the forked child is

preferable although the child environment is independent in this

case. From what I've seen MT-unsafe problems occur when you do

something in the parent MT environment and don't undo it in both

the parent and forked child.



I tested both patches about a week ago which seemed to fix the
problem; there are no more deadlocks of this shape.

Post patches and we can discuss the implications.


I'm happy to change things if needed. But it sounds a little bit

like there's a glibc bug at play (unlikely) too, we'll need to

understand what's going on there too.


Ian





[Index of Archives]     [Linux Filesystem Development]     [Linux Ext4]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux