Damn, I am stupid. On 03/24, Oleg Nesterov wrote: > > check_unsafe_exec() sets fs->in_exec under cred_guard_mutex, then execve() > paths clear fs->in_exec lockless. This is fine if exec succeeds, but if it > fails we have the following race: > > T1 sets fs->in_exec = 1, fails, drops cred_guard_mutex > > T2 sets fs->in_exec = 1 > > T1 clears fs->in_exec When I look at this code again, I think this race was not possible and thus this patch (applied as af7bb0d2ca45) was not needed. Yes, begin_new_exec() can drop cred_guard_mutex on failure, but only after de_thread() succeeds, when we can't race with another sub-thread. I hope this patch didn't make the things worse so we don't need to revert it. Plus I think it makes this (confusing) logic a bit more clear. Just, unless I am confused again, it wasn't really needed. ----------------------------------------------------------------------------- But. I didn't read the original report from syzbot, https://lore.kernel.org/all/67dc67f0.050a0220.25ae54.001f.GAE@xxxxxxxxxx/#t because I wasn't CC'ed. and then - sorry Kees!!! - I didn't bother to read your first reply carefully. So yes, with or without this patch the "if (fs->in_exec)" check in copy_fs() can obviously hit the 1 -> 0 transition. This is harmless, but should be probably fixed just to avoid another report from KCSAN. I do not want to add another spin_lock(fs->lock). We can change copy_fs() to use data_race(), but I'd prefer the patch below. Yes, it needs the additional comment(s) to explain READ_ONCE(). What do you think? Did I miss somthing again??? Quite possibly... Mateusz, I hope you will cleanup this horror sooner or later ;) Oleg. --- diff --git a/fs/exec.c b/fs/exec.c index 5d1c0d2dc403..42a7f9b43911 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1495,7 +1495,7 @@ static void free_bprm(struct linux_binprm *bprm) free_arg_pages(bprm); if (bprm->cred) { /* in case exec fails before de_thread() succeeds */ - current->fs->in_exec = 0; + WRITE_ONCE(current->fs->in_exec, 0); mutex_unlock(¤t->signal->cred_guard_mutex); abort_creds(bprm->cred); } diff --git a/kernel/fork.c b/kernel/fork.c index 4c2df3816728..381af8c8ece8 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -1802,7 +1802,7 @@ static int copy_fs(unsigned long clone_flags, struct task_struct *tsk) /* tsk->fs is already what we want */ spin_lock(&fs->lock); /* "users" and "in_exec" locked for check_unsafe_exec() */ - if (fs->in_exec) { + if (READ_ONCE(fs->in_exec)) { spin_unlock(&fs->lock); return -EAGAIN; }