On Wed, May 14, 2025 at 02:03:31AM +0200, Mateusz Guzik wrote: > On Wed, May 14, 2025 at 12:17 AM Eric W. Biederman > <ebiederm@xxxxxxxxxxxx> wrote: > > > > Jann Horn <jannh@xxxxxxxxxx> writes: > > > > > On Tue, May 13, 2025 at 10:57 PM Kees Cook <kees@xxxxxxxxxx> wrote: > > >> On May 13, 2025 6:05:45 AM PDT, Mateusz Guzik <mjguzik@xxxxxxxxx> wrote: > > >> >Here is my proposal: *deny* exec of suid/sgid binaries if fs_struct is > > >> >shared. This will have to be checked for after the execing proc becomes > > >> >single-threaded ofc. > > >> > > >> Unfortunately the above Chrome helper is setuid and uses CLONE_FS. > > > > > > Chrome first launches a setuid helper, and then the setuid helper does > > > CLONE_FS. Mateusz's proposal would not impact this usecase. > > > > > > Mateusz is proposing to block the case where a process first does > > > CLONE_FS, and *then* one of the processes sharing the fs_struct does a > > > setuid execve(). Linux already downgrades such an execve() to be > > > non-setuid, which probably means anyone trying to do this will get > > > hard-to-understand problems. Mateusz' proposal would just turn this > > > hard-to-debug edgecase, which already doesn't really work, into a > > > clean error; I think that is a nice improvement even just from the > > > UAPI standpoint. > > > > > > If this change makes it possible to clean up the kernel code a bit, even better. > > > > What has brought this to everyone's attention just now? This is > > the second mention of this code path I have seen this week. > > > > There is a syzkaller report concerning ->in_exec handling, for example: > https://lore.kernel.org/all/67dc67f0.050a0220.25ae54.001f.GAE@xxxxxxxxxx/#t > > [...] > > It looks like most of the lsm's also test bprm->unsafe. > > > > So I imagine someone could very carefully separate the non-ptrace case > > from the ptrace case but *shrug*. > > > > Perhaps: > > > > if ((is_setid || __cap_gained(permitted, new_old)) && > > ((bprm->unsafe & ~LSM_UNSAFE_PTRACE) || > > !ptracer_capable(current, new->user_ns))) { > > + if (!(bprm->unsafe & LSM_UNSAFE_PTRACE)) { > > + return -EPERM; > > + } > > /* downgrade; they get no more than they had, and maybe less */ > > if (!ns_capable(new->user_ns, CAP_SETUID) || > > (bprm->unsafe & LSM_UNSAFE_NO_NEW_PRIVS)) { > > new->euid = new->uid; > > new->egid = new->gid; > > } > > new->cap_permitted = cap_intersect(new->cap_permitted, > > old->cap_permitted); > > } > > > > If that is what you want that doesn't look to scary. I don't think > > it simplifies anything about fs->in_exec. As fs->in_exec is set when > > the processing calling exec is the only process that owns the fs_struct. > > With fs->in_exec just being a flag that doesn't allow another thread > > to call fork and start sharing the fs_struct during exec. > > > > *Shrug* > > > > I don't see why anyone would care. It is just a very silly corner case. > > Well I don't see how ptrace factors into any of this, apart from being > a different case of ignoring suid/sgid. I actually think we might want to expand the above bit of logic to use an explicit tests of each LSM_UNSAFE case -- the merged logic is very difficult to read currently. Totally untested expansion, if I'm reading everything correctly: if (bprm->unsafe && (is_setid || __cap_gained(permitted, new_old))) { bool limit_caps = false; bool strip_eid = false; unsigned int unsafe = bprm->unsafe; /* Check each bit */ if (unsafe & LSM_UNSAFE_PTRACE) { if (!ptracer_capable(current, new->user_ns)) limit_caps = true; unsafe &= ~LSM_UNSAFE_PTRACE; } if (unsafe & LSM_UNSAFE_SHARE) { limit_caps = true; if (!ns_capable(new->user_ns, CAP_SETUID)) strip_eid = true; unsafe &= ~LSM_UNSAFE_SHARE; } if (unsafe & LSM_UNSAFE_NO_NEW_PRIVS) { limit_caps = true; if (!ns_capable(new->user_ns, CAP_SETUID)) strip_eid = true; unsafe &= ~LSM_UNSAFE_NO_NEW_PRIVS; } if (WARN(unsafe, "Unhandled LSM_UNSAFE flag: %u?!\n", unsafe)) return -EINVAL; if (limit_caps) { new->cap_permitted = cap_intersect(new->cap_permitted, old->cap_permitted); } if (strip_eid) { new->euid = new->uid; new->egid = new->gid; } } > I can agree the suid/sgid situation vs CLONE_FS is a silly corner > case, but one which needs to be handled for security reasons and which > currently has weirdly convoluted code to do it. > > The intent behind my proposal is very much to get the crapper out of > the way in a future-proof and simple manner. > > In check_unsafe_exec() you can find a nasty loop over threads in the > group to find out if the fs struct is used by anyone outside of said > group. Since fs struct users are not explicitly tracked and any of > them can have different creds than the current thread, the kernel opts > to ignore suid/sgid if there are extra users found (for security > reasons). The loop depends on no new threads showing up as the list is > being walked, to that end copy_fs() can transiently return an error if > it spots ->in_exec. > > The >in_exec field is used as a boolean/flag, but parallel execs using > the same fs struct from different thread groups don't look serialized. > This is supposed to be fine as in this case ->in_exec is not getting > set to begin with, but it gets unconditionally unset on all execs. > > And so on. It's all weird af. 100% :) -- Kees Cook