Re: [PATCH bpf-next v2 1/2] bpf: Do not limit bpf_cgroup_from_id to current's namespace

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> · Tue, 12 Aug 2025 16:20:10 -0700

On Mon, Aug 11, 2025 at 12:59 PM Kumar Kartikeya Dwivedi
<memxor@xxxxxxxxx> wrote:
>
> The bpf_cgroup_from_id kfunc relies on cgroup_get_from_id to obtain the
> cgroup corresponding to a given cgroup ID. This helper can be called in
> a lot of contexts where the current thread can be random. A recent
> example was its use in sched_ext's ops.tick(), to obtain the root cgroup
> pointer. Since the current task can be whatever random user space task
> preempted by the timer tick, this makes the behavior of the helper
> unreliable.
>
> Resolve this by refactoring cgroup_get_from_id to take a parameter to
> elide the cgroup_is_descendant check when root_cgns parameter is set to
> true.
>
> There is no compatibility breakage here, since changing the namespace
> against which the lookup is being done to the root cgroup namespace only
> permits a wider set of lookups to succeed now. The cgroup IDs across
> namespaces are globally unique, and thus don't need to be retranslated.
>
> Reported-by: Dan Schatzberg <dschatzberg@xxxxxxxx>
> Acked-by: Tejun Heo <tj@xxxxxxxxxx>
> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@xxxxxxxxx>
> ---
>  include/linux/cgroup.h   | 2 +-
>  kernel/bpf/cgroup_iter.c | 2 +-
>  kernel/bpf/helpers.c     | 2 +-
>  kernel/cgroup/cgroup.c   | 7 ++++++-

hmm... I see mm/memcontrol.c and block/blk-cgroup-fc-appid.c using
this function as well, but you didn't update them, so under some
kernel config this will break?

but I'm also wondering if it wouldn't be cleaner to keep
cgroup_get_from_id() as is, and add __cgroup_get_from_id(u64 id) that
would not perform a cgroup_is_descendant() check. BPF helpers that
don't need this root descendancy would call __cgroup_get_from_id()
directly, while cgroup_get_from_id() would use __cgroup_get_from_id()
first, and then (if successful) would perform addition root_cgrp
check?

>  4 files changed, 9 insertions(+), 4 deletions(-)
>
> diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
> index b18fb5fcb38e..da757a496fbe 100644
> --- a/include/linux/cgroup.h
> +++ b/include/linux/cgroup.h
> @@ -650,7 +650,7 @@ static inline void cgroup_kthread_ready(void)
>  }
>
>  void cgroup_path_from_kernfs_id(u64 id, char *buf, size_t buflen);
> -struct cgroup *cgroup_get_from_id(u64 id);
> +struct cgroup *cgroup_get_from_id(u64 id, bool root_cgns);
>  #else /* !CONFIG_CGROUPS */
>
>  struct cgroup_subsys_state;
> diff --git a/kernel/bpf/cgroup_iter.c b/kernel/bpf/cgroup_iter.c
> index f04a468cf6a7..49234d035583 100644
> --- a/kernel/bpf/cgroup_iter.c
> +++ b/kernel/bpf/cgroup_iter.c
> @@ -212,7 +212,7 @@ static int bpf_iter_attach_cgroup(struct bpf_prog *prog,
>         if (fd)
>                 cgrp = cgroup_v1v2_get_from_fd(fd);
>         else if (id)
> -               cgrp = cgroup_get_from_id(id);
> +               cgrp = cgroup_get_from_id(id, false);
>         else /* walk the entire hierarchy by default. */
>                 cgrp = cgroup_get_from_path("/");
>
> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index 6b4877e85a68..12466103917f 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -2537,7 +2537,7 @@ __bpf_kfunc struct cgroup *bpf_cgroup_from_id(u64 cgid)
>  {
>         struct cgroup *cgrp;
>
> -       cgrp = cgroup_get_from_id(cgid);
> +       cgrp = cgroup_get_from_id(cgid, true);
>         if (IS_ERR(cgrp))
>                 return NULL;
>         return cgrp;
> diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
> index 312c6a8b55bb..b490e1e0d2c4 100644
> --- a/kernel/cgroup/cgroup.c
> +++ b/kernel/cgroup/cgroup.c
> @@ -6345,10 +6345,11 @@ void cgroup_path_from_kernfs_id(u64 id, char *buf, size_t buflen)
>  /*
>   * cgroup_get_from_id : get the cgroup associated with cgroup id
>   * @id: cgroup id
> + * @root_cgns: Select root cgroup namespace instead of current's.
>   * On success return the cgrp or ERR_PTR on failure
>   * Only cgroups within current task's cgroup NS are valid.
>   */
> -struct cgroup *cgroup_get_from_id(u64 id)
> +struct cgroup *cgroup_get_from_id(u64 id, bool root_cgns)
>  {
>         struct kernfs_node *kn;
>         struct cgroup *cgrp, *root_cgrp;
> @@ -6374,6 +6375,10 @@ struct cgroup *cgroup_get_from_id(u64 id)
>         if (!cgrp)
>                 return ERR_PTR(-ENOENT);
>
> +       /* We don't need to namespace this operation against current. */
> +       if (root_cgns)
> +               return cgrp;
> +
>         root_cgrp = current_cgns_cgroup_dfl();
>         if (!cgroup_is_descendant(cgrp, root_cgrp)) {
>                 cgroup_put(cgrp);
> --
> 2.47.3
>