From: Luca Boccassi <bluca@xxxxxxxxxx> Date: Mon, 12 May 2025 11:58:54 +0100 > On Mon, 12 May 2025 at 09:56, Christian Brauner <brauner@xxxxxxxxxx> wrote: > > > > Coredumping currently supports two modes: > > > > (1) Dumping directly into a file somewhere on the filesystem. > > (2) Dumping into a pipe connected to a usermode helper process > > spawned as a child of the system_unbound_wq or kthreadd. > > > > For simplicity I'm mostly ignoring (1). There's probably still some > > users of (1) out there but processing coredumps in this way can be > > considered adventurous especially in the face of set*id binaries. > > > > The most common option should be (2) by now. It works by allowing > > userspace to put a string into /proc/sys/kernel/core_pattern like: > > > > |/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h > > > > The "|" at the beginning indicates to the kernel that a pipe must be > > used. The path following the pipe indicator is a path to a binary that > > will be spawned as a usermode helper process. Any additional parameters > > pass information about the task that is generating the coredump to the > > binary that processes the coredump. > > > > In the example core_pattern shown above systemd-coredump is spawned as a > > usermode helper. There's various conceptual consequences of this > > (non-exhaustive list): > > > > - systemd-coredump is spawned with file descriptor number 0 (stdin) > > connected to the read-end of the pipe. All other file descriptors are > > closed. That specifically includes 1 (stdout) and 2 (stderr). This has > > already caused bugs because userspace assumed that this cannot happen > > (Whether or not this is a sane assumption is irrelevant.). > > > > - systemd-coredump will be spawned as a child of system_unbound_wq. So > > it is not a child of any userspace process and specifically not a > > child of PID 1. It cannot be waited upon and is in a weird hybrid > > upcall which are difficult for userspace to control correctly. > > > > - systemd-coredump is spawned with full kernel privileges. This > > necessitates all kinds of weird privilege dropping excercises in > > userspace to make this safe. > > > > - A new usermode helper has to be spawned for each crashing process. > > > > This series adds a new mode: > > > > (3) Dumping into an abstract AF_UNIX socket. > > > > Userspace can set /proc/sys/kernel/core_pattern to: > > > > @address SO_COOKIE > > > > The "@" at the beginning indicates to the kernel that the abstract > > AF_UNIX coredump socket will be used to process coredumps. The address > > is given by @address and must be followed by the socket cookie of the > > coredump listening socket. > > > > The socket cookie is used to verify the socket connection. If the > > coredump server restarts or crashes and someone recycles the socket > > address the kernel will detect that the address has been recycled as the > > socket cookie will have necessarily changed and refuse to connect. > > This dynamic/cookie prefix makes it impossible to use this with socket > activation units. The way systemd-coredump works is that every > instance is an independent templated unit, spawned when there's a > connection to the private socket. If the path was fixed, we could just > reuse the same mechanism, it would fit very nicely with minimal > changes. Note this version does not use prefix. Now it requires users to just pass the socket cookie via core_pattern so that the kernel can verify the peer. > > But because you need a "server" to be permanently running, this means > socket-based activation can no longer work, and systemd-coredump must > switch to a persistently-running mode. The only thing for systemd to do is assign a cookie after socket creation. As long as systemd hold the file descriptor of the socket, you don't need a dedicated "server" running permanently, and the fd can be passed around to a spawned/activated process. > This is a severe degradation of > functionality, will continuously waste CPU/memory resources for no > good reasons, and makes the whole thing more fragile and complex, as > if there are any issues with this server, you start losing core files. > And honestly I don't really see the point? Setting the pattern is a > privileged operation anyway. systemd manages the socket with a socket > unit and again that's privileged already. > > Could we drop this cookie prefix and go back to the previous version > (v5), please? Or if there is some specific non-systemd use case in > mind that I am not aware of, have both options, so that we can use the > simpler and more straightforward one with systemd-coredump.