Re: [PATCH 1/2] fs/exec: Explicitly unshare fs_struct on exec

"Eric W. Biederman" <ebiederm@xxxxxxxxxxxx> · Tue, 13 May 2025 10:29:47 -0500

Mateusz Guzik <mjguzik@xxxxxxxxx> writes:

> On Thu, Oct 06, 2022 at 08:25:01AM -0700, Kees Cook wrote:
>> On October 6, 2022 7:13:37 AM PDT, Jann Horn <jannh@xxxxxxxxxx> wrote:
>> >On Thu, Oct 6, 2022 at 11:05 AM Christian Brauner <brauner@xxxxxxxxxx> wrote:
>> >> On Thu, Oct 06, 2022 at 01:27:34AM -0700, Kees Cook wrote:
>> >> > The check_unsafe_exec() counting of n_fs would not add up under a heavily
>> >> > threaded process trying to perform a suid exec, causing the suid portion
>> >> > to fail. This counting error appears to be unneeded, but to catch any
>> >> > possible conditions, explicitly unshare fs_struct on exec, if it ends up
>> >>
>> >> Isn't this a potential uapi break? Afaict, before this change a call to
>> >> clone{3}(CLONE_FS) followed by an exec in the child would have the
>> >> parent and child share fs information. So if the child e.g., changes the
>> >> working directory post exec it would also affect the parent. But after
>> >> this change here this would no longer be true. So a child changing a
>> >> workding directoro would not affect the parent anymore. IOW, an exec is
>> >> accompanied by an unshare(CLONE_FS). Might still be worth trying ofc but
>> >> it seems like a non-trivial uapi change but there might be few users
>> >> that do clone{3}(CLONE_FS) followed by an exec.
>> >
>> >I believe the following code in Chromium explicitly relies on this
>> >behavior, but I'm not sure whether this code is in active use anymore:
>> >
>> >https://source.chromium.org/chromium/chromium/src/+/main:sandbox/linux/suid/sandbox.c;l=101?q=CLONE_FS&sq=&ss=chromium
>> 
>> Oh yes. I think I had tried to forget this existed. Ugh. Okay, so back to the drawing board, I guess. The counting will need to be fixed...
>> 
>> It's possible we can move the counting after dethread -- it seems the early count was just to avoid setting flags after the point of no return, but it's not an error condition...
>> 
>
> I landed here from git blame.
>
> I was looking at sanitizing shared fs vs suid handling, but the entire
> ordeal is so convoluted I'm confident the best way forward is to whack
> the problem to begin with.
>
> Per the above link, the notion of a shared fs struct across different
> processes is depended on so merely unsharing is a no-go.
>
> However, the shared state is only a problem for suid/sgid.
>
> Here is my proposal: *deny* exec of suid/sgid binaries if fs_struct is
> shared. This will have to be checked for after the execing proc becomes
> single-threaded ofc.
>
> While technically speaking this does introduce a change in behavior,
> there is precedent for doing it and seeing if anyone yells.
>
> With this in place there is no point maintainig ->in_exec or checking
> the flag.
>
> There is the known example of depending on shared fs_struct across exec.
> Hopefully there is no example of depending on execing a suid/sgid binary
> in such a setting -- it would be quite a weird setup given that for
> security reasons the perms must not be changed.
>
> The upshot of this method is that any breakage will be immediately
> visible in the form of a failed exec.
>
> Another route would be to do the mandatory unshare but only for
> suid/sgid, except that would have a hidden failure (if you will).
>
> Comments?

What is the problem that is trying to be fixed?

A uapi change to not allow sharing a fs_struct for processes that change
their cred on exec seems possible.

I said changing cred instead of suid/sgid because there are capabilities
and LSM labels that we probably want this to apply to as well.

I think such a limitation can be justified based upon having a shared
fs_struct is likely to allow confuse suid executables.

Earlier in the thread there was talk about the refcount for fs_struct.
I don't see that problem at the moment, and I don't see how dealing with
suid+sgid exectuables will have any bearing on how the refcount works.

Eric