On Tue, Apr 29, 2025 at 5:50 PM Oleg Nesterov <oleg@xxxxxxxxxx> wrote: > > Damn, I am stupid. > > On 03/24, Oleg Nesterov wrote: > > > > check_unsafe_exec() sets fs->in_exec under cred_guard_mutex, then execve() > > paths clear fs->in_exec lockless. This is fine if exec succeeds, but if it > > fails we have the following race: > > > > T1 sets fs->in_exec = 1, fails, drops cred_guard_mutex > > > > T2 sets fs->in_exec = 1 > > > > T1 clears fs->in_exec > > When I look at this code again, I think this race was not possible and thus > this patch (applied as af7bb0d2ca45) was not needed. > > Yes, begin_new_exec() can drop cred_guard_mutex on failure, but only after > de_thread() succeeds, when we can't race with another sub-thread. > > I hope this patch didn't make the things worse so we don't need to revert it. > Plus I think it makes this (confusing) logic a bit more clear. Just, unless > I am confused again, it wasn't really needed. > > ----------------------------------------------------------------------------- > But. I didn't read the original report from syzbot, > https://lore.kernel.org/all/67dc67f0.050a0220.25ae54.001f.GAE@xxxxxxxxxx/#t > because I wasn't CC'ed. and then - sorry Kees!!! - I didn't bother to read > your first reply carefully. > > So yes, with or without this patch the "if (fs->in_exec)" check in copy_fs() > can obviously hit the 1 -> 0 transition. > > This is harmless, but should be probably fixed just to avoid another report > from KCSAN. > > I do not want to add another spin_lock(fs->lock). We can change copy_fs() to > use data_race(), but I'd prefer the patch below. Yes, it needs the additional > comment(s) to explain READ_ONCE(). > > What do you think? Did I miss somthing again??? Quite possibly... > > Mateusz, I hope you will cleanup this horror sooner or later ;) > > Oleg. > --- > > diff --git a/fs/exec.c b/fs/exec.c > index 5d1c0d2dc403..42a7f9b43911 100644 > --- a/fs/exec.c > +++ b/fs/exec.c > @@ -1495,7 +1495,7 @@ static void free_bprm(struct linux_binprm *bprm) > free_arg_pages(bprm); > if (bprm->cred) { > /* in case exec fails before de_thread() succeeds */ > - current->fs->in_exec = 0; > + WRITE_ONCE(current->fs->in_exec, 0); > mutex_unlock(¤t->signal->cred_guard_mutex); > abort_creds(bprm->cred); > } > diff --git a/kernel/fork.c b/kernel/fork.c > index 4c2df3816728..381af8c8ece8 100644 > --- a/kernel/fork.c > +++ b/kernel/fork.c > @@ -1802,7 +1802,7 @@ static int copy_fs(unsigned long clone_flags, struct task_struct *tsk) > /* tsk->fs is already what we want */ > spin_lock(&fs->lock); > /* "users" and "in_exec" locked for check_unsafe_exec() */ > - if (fs->in_exec) { > + if (READ_ONCE(fs->in_exec)) { > spin_unlock(&fs->lock); > return -EAGAIN; > } > good grief man ;) I have this on my TODO list, (un)fortunately $life got in the way and by now I swapped out almost all of the context. I mostly remember the code is hard to follow. ;) that said, maybe i'll give it a stab soon(tm). I have a testcase somewhere to validate that this does provide the expect behavior vs suid, so it's not going to be me flying blindly. -- Mateusz Guzik <mjguzik gmail.com>