On 04/10, Christian Brauner wrote: > > On Thu, Apr 10, 2025 at 12:18:01PM +0200, Oleg Nesterov wrote: > > On 04/09, Oleg Nesterov wrote: > > > > > > On 04/09, Christian Brauner wrote: > > > > > > > > The seqcounter might be > > > > useful independent of pidfs. > > > > > > Are you sure? ;) to me the new pid->pid_seq needs more justification... > > Yeah, pretty much. I'd make use of this in other cases where we need to > detect concurrent changes to struct pid without having to take any > locks. Multi-threaded exec in de_exec() comes to mind as well. Perhaps you are right, but so far I am still not sure it makes sense. And we can always add it later if we have another (more convincing) use-case. > > To remind, detach_pid(pid, PIDTYPE_PID) does wake_up_all(&pid->wait_pidfd) and > > takes pid->wait_pidfd->lock. > > > > So if pid_has_task(PIDTYPE_PID) succeeds, __unhash_process() -> detach_pid(TGID) > > is not possible until we drop pid->wait_pidfd->lock. > > > > If detach_pid(PIDTYPE_PID) was already called and have passed wake_up_all(), > > pid_has_task(PIDTYPE_PID) can't succeed. > > I know. I was trying to avoid having to take the lock and just make this > lockless. But if you think we should use this lock here instead I'm > willing to do this. I just find the sequence counter more elegant than > the spin_lock_irq(). This is subjective, and quite possibly I am wrong. But yes, I'd prefer to (ab)use pid->wait_pidfd->lock in pidfd_prepare() for now and not penalize __unhash_process(). Simply because this is simpler. If you really dislike taking wait_pidfd->lock, we can add mb() into __unhash_process() or even smp_mb__after_spinlock() into __change_pid(), but this will need a lengthy comment... As for your patch... it doesn't apply on top of 3/4, but I guess it is clear what does it do, and (unfortunately ;) it looks correct, so I won't insist too much. See a couple of nits below. > this imho and it would give pidfds a reliable way to detect relevant > concurrent changes locklessly without penalizing other critical paths > (e.g., under tasklist_lock) in the kernel. Can't resist... Note that raw_seqcount_begin() in pidfd_prepare() will take/drop tasklist_lock if it races with __unhash_process() on PREEMPT_RT. Yes, this is unlikely case, but still... Now. Unless I misread your patch, pidfd_prepare() does "err = 0" only once before the main loop. And this is correct. But this means that we do not need the do/while loop. If read_seqcount_retry() returns true, we can safely return -ESRCH. So we can do seq = raw_seqcount_begin(&pid->pid_seq); if (!PIDFD_THREAD && !pid_has_task(PIDTYPE_TGID)) err = -ENOENT; if (!pid_has_task(PIDTYPE_PID)) err = -ESRCH; if (read_seqcount_retry(pid->pid_seq, seq)) err = -ESRCH; In fact we don't even need raw_seqcount_begin(), we could use raw_seqcount_try_begin(). And why seqcount_rwlock_t? A plain seqcount_t can equally work. And, if we use seqcount_rwlock_t, lockdep_assert_held_write(&tasklist_lock); ... raw_write_seqcount_begin(pid->pid_seq); in __unhash_process() looks a bit strange. I'd suggest to use write_seqcount_begin() which does seqprop_assert() and kill lockdep_assert_held_write(). Oleg.