On Fri, Apr 04, 2025 at 02:37:38PM +0200, Oleg Nesterov wrote: > On 04/03, Christian Brauner wrote: > > > > We currently report EINVAL whenever a struct pid has no tasked attached > > anymore thereby conflating two concepts: > > > > (1) The task has already been reaped. > > (2) The caller requested a pidfd for a thread-group leader but the pid > > actually references a struct pid that isn't used as a thread-group > > leader. > > > > This is causing issues for non-threaded workloads as in [1]. > > > > This patch tries to allow userspace to distinguish between (1) and (2). > > This is racy of course but that shouldn't matter. > > > > Link: https://github.com/systemd/systemd/pull/36982 [1] > > Signed-off-by: Christian Brauner <brauner@xxxxxxxxxx> > > For this series: > > Reviewed-by: Oleg Nesterov <oleg@xxxxxxxxxx> > > > But I have a couple of cosmetic nits... > > > int pidfd_prepare(struct pid *pid, unsigned int flags, struct file **ret) > > { > > - bool thread = flags & PIDFD_THREAD; > > + int err = 0; > > > > - if (!pid_has_task(pid, thread ? PIDTYPE_PID : PIDTYPE_TGID)) > > - return -EINVAL; > > + if (!(flags & PIDFD_THREAD)) { > > + /* > > + * If this is struct pid isn't used as a thread-group > > + * leader pid but the caller requested to create a > > + * thread-group leader pidfd then report ENOENT to the > > + * caller as a hint. > > + */ > > + if (!pid_has_task(pid, PIDTYPE_TGID)) > > + err = -ENOENT; > > + } > > + > > + /* > > + * If this wasn't a thread-group leader struct pid or the task > > + * got reaped in the meantime report -ESRCH to userspace. > > + * > > + * This is racy of course. This could've not been a thread-group > > + * leader struct pid and we set ENOENT above but in the meantime > > + * the task got reaped. Or there was a multi-threaded-exec by a > > + * subthread and we were a thread-group leader but now got > > + * killed. > > The comment about the multi-threaded-exec looks a bit misleading to me. > If this pid is a group-leader-pid and we race with de_thread() which does > > exchange_tids(tsk, leader); > transfer_pid(leader, tsk, PIDTYPE_TGID); > > nothing "bad" can happen, both pid_has_task(PIDTYPE_PID) or > pid_has_task(PIDTYPE_TGID) can't return NULL during (or after) this > transition. > > hlists_swap_heads_rcu() or hlist_replace_rcu() can't make > hlist_head->first == NULL during this transition... Good point. > > Or I misunderstood the comment? > > And... the code looks a bit overcomplicated to me, why not simply > > int pidfd_prepare(struct pid *pid, unsigned int flags, struct file **ret) > { > if (!pid_has_task(pid, PIDTYPE_PID)) > return -ESRCH; > > if (!(flags & PIDFD_THREAD) && !pid_has_task(pid, PIDTYPE_TGID)) > return -ENOENT; I thought that checking PIDTYPE_PID first could cause misleading results where we report ENOENT where we should report ESRCH: If the task was released after the successful PIDTYPE_PID check for a pid that was never a thread-group leader we report ENOENT. That's what I reversed the check. But I can adapt that to you scheme. I mostly wanted a place to put the comments. > > return __pidfd_prepare(pid, flags, ret); > } > > ? Of course, the comments should stay. > > But again, this is cosmetic/subjective, please do what you like more. > > Oleg. >