Re: [PATCH RFC 3/4] pidfd: improve uapi when task isn't found

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Apr 04, 2025 at 02:37:38PM +0200, Oleg Nesterov wrote:
> On 04/03, Christian Brauner wrote:
> >
> > We currently report EINVAL whenever a struct pid has no tasked attached
> > anymore thereby conflating two concepts:
> >
> > (1) The task has already been reaped.
> > (2) The caller requested a pidfd for a thread-group leader but the pid
> >     actually references a struct pid that isn't used as a thread-group
> >     leader.
> >
> > This is causing issues for non-threaded workloads as in [1].
> >
> > This patch tries to allow userspace to distinguish between (1) and (2).
> > This is racy of course but that shouldn't matter.
> >
> > Link: https://github.com/systemd/systemd/pull/36982 [1]
> > Signed-off-by: Christian Brauner <brauner@xxxxxxxxxx>
> 
> For this series:
> 
> Reviewed-by: Oleg Nesterov <oleg@xxxxxxxxxx>
> 
> 
> But I have a couple of cosmetic nits...
> 
> >  int pidfd_prepare(struct pid *pid, unsigned int flags, struct file **ret)
> >  {
> > -	bool thread = flags & PIDFD_THREAD;
> > +	int err = 0;
> >
> > -	if (!pid_has_task(pid, thread ? PIDTYPE_PID : PIDTYPE_TGID))
> > -		return -EINVAL;
> > +	if (!(flags & PIDFD_THREAD)) {
> > +		/*
> > +		 * If this is struct pid isn't used as a thread-group
> > +		 * leader pid but the caller requested to create a
> > +		 * thread-group leader pidfd then report ENOENT to the
> > +		 * caller as a hint.
> > +		 */
> > +		if (!pid_has_task(pid, PIDTYPE_TGID))
> > +			err = -ENOENT;
> > +	}
> > +
> > +	/*
> > +	 * If this wasn't a thread-group leader struct pid or the task
> > +	 * got reaped in the meantime report -ESRCH to userspace.
> > +	 *
> > +	 * This is racy of course. This could've not been a thread-group
> > +	 * leader struct pid and we set ENOENT above but in the meantime
> > +	 * the task got reaped. Or there was a multi-threaded-exec by a
> > +	 * subthread and we were a thread-group leader but now got
> > +	 * killed.
> 
> The comment about the multi-threaded-exec looks a bit misleading to me.
> If this pid is a group-leader-pid and we race with de_thread() which does
> 
> 		exchange_tids(tsk, leader);
> 		transfer_pid(leader, tsk, PIDTYPE_TGID);
> 
> nothing "bad" can happen, both pid_has_task(PIDTYPE_PID) or
> pid_has_task(PIDTYPE_TGID) can't return NULL during (or after) this
> transition.
> 
> hlists_swap_heads_rcu() or hlist_replace_rcu() can't make
> hlist_head->first == NULL during this transition...

Good point.

> 
> Or I misunderstood the comment?
> 
> And... the code looks a bit overcomplicated to me, why not simply
> 
> 	int pidfd_prepare(struct pid *pid, unsigned int flags, struct file **ret)
> 	{
> 		if (!pid_has_task(pid, PIDTYPE_PID))
> 			return -ESRCH;
> 
> 		if (!(flags & PIDFD_THREAD) && !pid_has_task(pid, PIDTYPE_TGID))
> 			return -ENOENT;

I thought that checking PIDTYPE_PID first could cause misleading results
where we report ENOENT where we should report ESRCH: If the task was
released after the successful PIDTYPE_PID check for a pid that was never
a thread-group leader we report ENOENT. That's what I reversed the
check. But I can adapt that to you scheme. I mostly wanted a place to
put the comments.

> 
> 		return __pidfd_prepare(pid, flags, ret);
> 	}
> 
> ? Of course, the comments should stay.
> 
> But again, this is cosmetic/subjective, please do what you like more.
> 
> Oleg.
> 




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux