On Mon, Mar 24, 2025 at 5:00 PM Oleg Nesterov <oleg@xxxxxxxxxx> wrote: > > check_unsafe_exec() sets fs->in_exec under cred_guard_mutex, then execve() > paths clear fs->in_exec lockless. This is fine if exec succeeds, but if it > fails we have the following race: > > T1 sets fs->in_exec = 1, fails, drops cred_guard_mutex > > T2 sets fs->in_exec = 1 > > T1 clears fs->in_exec > > T2 continues with fs->in_exec == 0 > > Change fs/exec.c to clear fs->in_exec with cred_guard_mutex held. > I had cursory glances at this code earlier and the more I try to understand it the more confused I am. The mutex at hand hides in ->signal and fs->in_exec remains treated as a flag. The loop in check_unsafe_exec() tries to guard against a task which shares ->fs, but does not share ->mm? To my reading this implies unshared ->signal, so the mutex protection does not apply. I think this ends up being harmless as in this case nobody is going to set ->in_exec (instead they are going to share LSM_UNSAFE_SHARE), so clearing it in these spots becomes a nop. At the same time the check in copy_fs() no longer blocks clones as check_unsafe_exec() already opts to LSM_UNSAFE_SHARE? Even if this all works with the patch, this is an incredibly odd set of dependencies and I don't see a good reason for it to still be here. Per my other e-mail the obvious scheme would serialize all execs sharing ->fs and make copy_fs do a killable wait for execs to finish. Arguably this would also improve userspace-visible behavior as a transient -EBUSY would be eliminated. No matter what the specific solution, imo treating ->in_exec as a flag needs to die. is there a problem getting this done even for stable kernels? I understand it would be harder to backport churn-wise, but should be much easier to reason about? Just my $0,03 -- Mateusz Guzik <mjguzik gmail.com>