Mateusz Guzik <mjguzik@xxxxxxxxx> writes: > On Thu, Oct 06, 2022 at 08:25:01AM -0700, Kees Cook wrote: >> On October 6, 2022 7:13:37 AM PDT, Jann Horn <jannh@xxxxxxxxxx> wrote: >> >On Thu, Oct 6, 2022 at 11:05 AM Christian Brauner <brauner@xxxxxxxxxx> wrote: >> >> On Thu, Oct 06, 2022 at 01:27:34AM -0700, Kees Cook wrote: >> >> > The check_unsafe_exec() counting of n_fs would not add up under a heavily >> >> > threaded process trying to perform a suid exec, causing the suid portion >> >> > to fail. This counting error appears to be unneeded, but to catch any >> >> > possible conditions, explicitly unshare fs_struct on exec, if it ends up >> >> >> >> Isn't this a potential uapi break? Afaict, before this change a call to >> >> clone{3}(CLONE_FS) followed by an exec in the child would have the >> >> parent and child share fs information. So if the child e.g., changes the >> >> working directory post exec it would also affect the parent. But after >> >> this change here this would no longer be true. So a child changing a >> >> workding directoro would not affect the parent anymore. IOW, an exec is >> >> accompanied by an unshare(CLONE_FS). Might still be worth trying ofc but >> >> it seems like a non-trivial uapi change but there might be few users >> >> that do clone{3}(CLONE_FS) followed by an exec. >> > >> >I believe the following code in Chromium explicitly relies on this >> >behavior, but I'm not sure whether this code is in active use anymore: >> > >> >https://source.chromium.org/chromium/chromium/src/+/main:sandbox/linux/suid/sandbox.c;l=101?q=CLONE_FS&sq=&ss=chromium >> >> Oh yes. I think I had tried to forget this existed. Ugh. Okay, so back to the drawing board, I guess. The counting will need to be fixed... >> >> It's possible we can move the counting after dethread -- it seems the early count was just to avoid setting flags after the point of no return, but it's not an error condition... >> > > I landed here from git blame. > > I was looking at sanitizing shared fs vs suid handling, but the entire > ordeal is so convoluted I'm confident the best way forward is to whack > the problem to begin with. > > Per the above link, the notion of a shared fs struct across different > processes is depended on so merely unsharing is a no-go. > > However, the shared state is only a problem for suid/sgid. > > Here is my proposal: *deny* exec of suid/sgid binaries if fs_struct is > shared. This will have to be checked for after the execing proc becomes > single-threaded ofc. > > While technically speaking this does introduce a change in behavior, > there is precedent for doing it and seeing if anyone yells. > > With this in place there is no point maintainig ->in_exec or checking > the flag. > > There is the known example of depending on shared fs_struct across exec. > Hopefully there is no example of depending on execing a suid/sgid binary > in such a setting -- it would be quite a weird setup given that for > security reasons the perms must not be changed. > > The upshot of this method is that any breakage will be immediately > visible in the form of a failed exec. > > Another route would be to do the mandatory unshare but only for > suid/sgid, except that would have a hidden failure (if you will). > > Comments? What is the problem that is trying to be fixed? A uapi change to not allow sharing a fs_struct for processes that change their cred on exec seems possible. I said changing cred instead of suid/sgid because there are capabilities and LSM labels that we probably want this to apply to as well. I think such a limitation can be justified based upon having a shared fs_struct is likely to allow confuse suid executables. Earlier in the thread there was talk about the refcount for fs_struct. I don't see that problem at the moment, and I don't see how dealing with suid+sgid exectuables will have any bearing on how the refcount works. Eric