On Fri, May 16, 2025 at 12:34 PM Christian Brauner <brauner@xxxxxxxxxx> wrote: > On Thu, May 15, 2025 at 10:56:26PM +0200, Jann Horn wrote: > > Why can we safely put the pidfs reference now but couldn't do it > > before the kernel_connect()? Does the kernel_connect() look up this > > pidfs entry by calling something like pidfs_alloc_file()? Or does that > > only happen later on, when the peer does getsockopt(SO_PEERPIDFD)? > > AF_UNIX sockets support SO_PEERPIDFD as you know. Users such as dbus or > systemd want to be able to retrieve a pidfd for the peer even if the > peer has already been reaped. To support this AF_UNIX ensures that when > the peer credentials are set up (connect(), listen()) the corresponding > @pid will also be registered in pidfs. This ensures that exit > information is stored in the inode if we hand out a pidfd for a reaped > task. IOW, we only hand out pidfds for reaped task if at the time of > reaping a pidfs entry existed for it. > > Since we're setting coredump information on the pidfd here we're calling > pidfs_register_pid() even before connect() sets up the peer credentials > so we're sure that the coredump information is stored in the inode. > > Then we delay our pidfs_put_pid() call until the connect() took it's own > reference and thus continues pinning the inode. IOW, connect() will also > call pidfs_register_pid() but it will ofc just increment the reference > count ensuring that our pidfs_put_pid() doesn't drop the inode. Aah, so the call graph looks like this: unix_stream_connect prepare_peercred pidfs_register_pid [pidfs reference taken] [point of no return] init_peercred [copies creds to socket, moving ref ownership] copy_peercred [copies creds from socket to peer socket, taking refs] Thanks for explaining!