PIDFD_THREAD behavior for thread-group leaders

Christian Brauner <brauner@xxxxxxxxxx> · Thu, 6 Mar 2025 12:41:26 +0100

Oleg,

I've been thinking about the multi-threaded exec case where a
non-thread-group leader task execs and assumes the thread-group leaders
struct pid.

Back when we implemented support for PIDFD_THREAD we ended up with the
decision that if userspace holds:

pidfd_leader_thread = pidfd_open(<thread-group-leader-pid>, PIDFD_THREAD)

that exit notification is not strictly defined if a non-thread-group
leader thread execs: If poll is called before the exec happened, then an
exit notification may be observed and if aftewards no exit notification
is generated for the old thread-group leader. Of if exit for the old
thread-group leader was observed but poll is called again then it would
block again.

I was wondering why the following snippet wouldn't work to ensure that
PIDFD_THREAD pidfds for thread-group leaders wouldn't be woken with
spurious exits:

diff --git a/kernel/exit.c b/kernel/exit.c
index 9916305e34d3..b79ded1b3bf5 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -745,8 +745,11 @@ static void exit_notify(struct task_struct *tsk, int group_dead)
        /*
         * sub-thread or delay_group_leader(), wake up the
         * PIDFD_THREAD waiters.
+        *
+        * The thread-group leader will be taken over by the execing
+        * task so don't cause spurious wakeups.
         */
-       if (!thread_group_empty(tsk))
+       if (!thread_group_empty(tsk) && (tsk->signal->notify_count >= 0))
                do_notify_pidfd(tsk);

        if (unlikely(tsk->ptrace)) {

Because that would seem more consistent to me. The downside would be
that if userspace performed a series of multi-threaded exec for
non-thread-group leader threads then waiters wouldn't get woken. But I
think that's probably ok.

To handle this case we could later think about whether we can instead
start generating a separate poll (POLLPRI?) event when exec happens.

I'm probably missing something very obvious why that won't work.

Christian