On 2025-07-18, Aleksa Sarai <cyphar@xxxxxxxxxx> wrote: > On 2025-07-18, nicolas.bouchinet@xxxxxxxxxxxxxxxxx <nicolas.bouchinet@xxxxxxxxxxxxxxxxx> wrote: > > From: Nicolas Bouchinet <nicolas.bouchinet@xxxxxxxxxxx> > > > > The hidepid mount option documentation defines the following modes: > > > > - "noaccess": user may not access any `/proc/<pid>/ directories but > > their own. > > - "invisible": all `/proc/<pid>/` will be fully invisible to other users. > > - "ptraceable": means that procfs should only contain `/proc/<pid>/` > > directories that the caller can ptrace. > > > > We thus expect that with "noaccess" and "invisible" users would be able to > > see their own processes in `/proc/<pid>/`. > > > > The implementation of hidepid however control accesses using the > > `ptrace_may_access()` function in any cases. Thus, if a process set > > itself as non-dumpable using the `prctl(PR_SET_DUMPABLE, > > SUID_DUMP_DISABLE)` it becomes invisible to the user. > > In my view, the documentation is wrong here. This behaviour has remained > effectively unchanged since it was introduced in 0499680a4214 ("procfs: > add hidepid= and gid= mount options"), and the documentation was written > by the same author (added to Cc, though they appear to be inactive since > 2013). hidepid=ptraceable was added many years later, and so the current > documentation seeming somewhat contradictory is probably more a result > of a new feature being documented without rewriting the old > documentation, rather than an incorrect implementation. > > A process marking itself with SUID_DUMP_DISABLE is a *very* strong > signal that other processes (even processes owned by the same user) must > have very restricted access to it. Given how many times they have been > instrumental for protecting against attacks, I am quite hesitant about > making changes to loosen these restrictions. > > For instance, container runtimes need to set SUID_DUMP_DISABLE to avoid > all sorts of breakout attacks (CVE-2016-9962 and CVE-2019-5736 being the > most famous examples, but there are plenty of others). If a container > has been configured with a restrictive hidepid, I would expect the > kernel to block most attempts to interact with such a process to > non-privileged users. But this patch would loosen said restrictions. > > Now, many of the bits in /proc/self/* are additionally gated behind > ptrace_may_access() checks, so this kind of change might be less > catastrophic than at first glance, but the original concerns that > motivated hidepid= were about /proc/self/cmdline and the uid/euid of > processes being discoverable, and AFAICS this patch still undoes those > protections for the cases we care about with SUID_DUMP_DISABLE. > > What motivated you to want to change this behaviour? > > > This patch fixes the `has_pid_permissions()` function in order to make > > its behavior to match the documentation. > > > > Note that since `ptrace_may_access()` is not called anymore with > > "noaccess" and "invisible", the `security_ptrace_access_check()` will no > > longer be called either. > > > > Signed-off-by: Nicolas Bouchinet <nicolas.bouchinet@xxxxxxxxxxx> > > --- > > fs/proc/base.c | 27 ++++++++++++++++++++++++--- > > 1 file changed, 24 insertions(+), 3 deletions(-) > > > > diff --git a/fs/proc/base.c b/fs/proc/base.c > > index c667702dc69b8ca2531e88e12ed7a18533f294dd..fb128cb5f95fe65016fce96c75aee18c762a30f2 100644 > > --- a/fs/proc/base.c > > +++ b/fs/proc/base.c > > @@ -746,9 +746,12 @@ static bool has_pid_permissions(struct proc_fs_info *fs_info, > > struct task_struct *task, > > enum proc_hidepid hide_pid_min) > > { > > + const struct cred *cred = current_cred(), *tcred; > > + kuid_t caller_uid; > > + kgid_t caller_gid; > > /* > > - * If 'hidpid' mount option is set force a ptrace check, > > - * we indicate that we are using a filesystem syscall > > + * If 'hidepid=ptraceable' mount option is set, force a ptrace check. > > + * We indicate that we are using a filesystem syscall > > * by passing PTRACE_MODE_READ_FSCREDS > > */ > > if (fs_info->hide_pid == HIDEPID_NOT_PTRACEABLE) > > @@ -758,7 +761,25 @@ static bool has_pid_permissions(struct proc_fs_info *fs_info, > > return true; > > if (in_group_p(fs_info->pid_gid)) > > return true; > > - return ptrace_may_access(task, PTRACE_MODE_READ_FSCREDS); > > + > > + task_lock(task); > > + rcu_read_lock(); > > + caller_uid = cred->fsuid; > > + caller_gid = cred->fsgid; > > + tcred = __task_cred(task); > > + if (uid_eq(caller_uid, tcred->euid) && > > + uid_eq(caller_uid, tcred->suid) && > > + uid_eq(caller_uid, tcred->uid) && > > + gid_eq(caller_gid, tcred->egid) && > > + gid_eq(caller_gid, tcred->sgid) && > > + gid_eq(caller_gid, tcred->gid)) { > > + rcu_read_unlock(); > > + task_unlock(task); > > + return true; > > + } > > + rcu_read_unlock(); > > + task_unlock(task); > > + return false; > > At the very least, this check needs to be gated based on > ns_capable(get_task_mm(task)->user_ns, CAP_SYS_PTRACE), to avoid > containers from being able to introspect SUID_DUMP_DISABLE processes > (such as container runtimes) in the process of joining a user namespaced > container. Actually get_task_mm(test)->user_ns == current_user_ns() is probably want you want? ns_capable(..., CAP_SYS_PTRACE) is basically an equivalent of ptrace_may_access() here. But we very rarely do permission checks based just on the userns -- we almost always use ns_capable() since that actually handles the hierarchical relationship between user namespaces regarding privileges. Either way, I'm not a fan of this change, to be honest. > > > } > > > > > > > > --- > > base-commit: 884a80cc9208ce75831b2376f2b0464018d7f2c4 > > change-id: 20250718-hidepid_fix-d0743d0540e7 > > > > Best regards, > > -- > > Nicolas Bouchinet <nicolas.bouchinet@xxxxxxxxxxx> > > > > -- Aleksa Sarai Senior Software Engineer (Containers) SUSE Linux GmbH https://www.cyphar.com/
Attachment:
signature.asc
Description: PGP signature