Re: [PATCH bpf-next v3 2/5] bpf: Implement mprog API on top of existing cgroup progs

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> · Tue, 27 May 2025 14:36:46 -0700

On Fri, May 23, 2025 at 6:04 PM Yonghong Song <yonghong.song@xxxxxxxxx> wrote:
>
>
>
> On 5/22/25 1:45 PM, Andrii Nakryiko wrote:
> > On Sat, May 17, 2025 at 9:27 AM Yonghong Song <yonghong.song@xxxxxxxxx> wrote:
> >> Current cgroup prog ordering is appending at attachment time. This is not
> >> ideal. In some cases, users want specific ordering at a particular cgroup
> >> level. To address this, the existing mprog API seems an ideal solution with
> >> supporting BPF_F_BEFORE and BPF_F_AFTER flags.
> >>
> >> But there are a few obstacles to directly use kernel mprog interface.
> >> Currently cgroup bpf progs already support prog attach/detach/replace
> >> and link-based attach/detach/replace. For example, in struct
> >> bpf_prog_array_item, the cgroup_storage field needs to be together
> >> with bpf prog. But the mprog API struct bpf_mprog_fp only has bpf_prog
> >> as the member, which makes it difficult to use kernel mprog interface.
> >>
> >> In another case, the current cgroup prog detach tries to use the
> >> same flag as in attach. This is different from mprog kernel interface
> >> which uses flags passed from user space.
> >>
> >> So to avoid modifying existing behavior, I made the following changes to
> >> support mprog API for cgroup progs:
> >>   - The support is for prog list at cgroup level. Cross-level prog list
> >>     (a.k.a. effective prog list) is not supported.
> >>   - Previously, BPF_F_PREORDER is supported only for prog attach, now
> >>     BPF_F_PREORDER is also supported by link-based attach.
> >>   - For attach, BPF_F_BEFORE/BPF_F_AFTER/BPF_F_ID/BPF_F_LINK is supported
> >>     similar to kernel mprog but with different implementation.
> >>   - For detach and replace, use the existing implementation.
> >>   - For attach, detach and replace, the revision for a particular prog
> >>     list, associated with a particular attach type, will be updated
> >>     by increasing count by 1.
> >>
> >> Signed-off-by: Yonghong Song <yonghong.song@xxxxxxxxx>
> >> ---
> >>   include/uapi/linux/bpf.h       |   7 ++
> >>   kernel/bpf/cgroup.c            | 195 +++++++++++++++++++++++++++++----
> >>   kernel/bpf/syscall.c           |  43 +++++---
> >>   tools/include/uapi/linux/bpf.h |   7 ++
> >>   4 files changed, 214 insertions(+), 38 deletions(-)
> >>

[...]

> >> +
> >> +       if (link || id || id_or_fd) {
> > please, use "is_id" to make it obvious that this is bool, it's very
> > confusing to see "id || id_or_fd"
> >
> > same for is_link, please
>
> Okay, I am following mprog.c code like below:
>
> ====
> static int bpf_mprog_link(struct bpf_tuple *tuple,
>                            u32 id_or_fd, u32 flags,
>                            enum bpf_prog_type type)
> {
>          struct bpf_link *link = ERR_PTR(-EINVAL);
>          bool id = flags & BPF_F_ID;
>
>          if (id)
>                  link = bpf_link_by_id(id_or_fd);
>          else if (id_or_fd)
>                  link = bpf_link_get_from_fd(id_or_fd);
> ====
>
> But agree is_id/is_link is more clear.

yeah, existing code is also confusing :)

>
> >
> >> +               /* flags must have either BPF_F_BEFORE or BPF_F_AFTER */
> >> +               if (!(flags & BPF_F_BEFORE) != !!(flags & BPF_F_AFTER))
> > either/or here means exclusive or inclusive?
> >
> > if it's inclusive: if (flags & (BPF_F_BEFORE | BPF_F_AFTER)) should be
> > enough to check that at least one of them is set
> >
> > if exclusive, below you use a different style of checking (which
> > arguably is easier to follow), so let's stay consistent
> >
> >
> > I got to say that my brain broke trying to reason about this pattern:
> >
> >     if (!(...) != !!(...))

Note how you used singular ! in the left condition. Probably a typo,
but I wasn't sure if that was intentional.

> >
> > Way too many exclamations/negations, IMO... I'm not sure what sort of
> > condition we are expressing here?
>
> Sorry for confusion. What I mean is 'exclusive'. I guess I can do

It's fine to use a sort-of-standard pattern of !!(...) != !!(...),
just make sure to use the right amount of !!

>
> bool is_before = flags & BPF_F_BEFORE;
> bool is_after = flags & BPF_F_AFTER;
> if (is_link || is_id || id_or_fd) {
>      if (is_before == is_after)
>          return ERR_PTR(-EINVAL);
> } else if (!hist_empty(progs)) {
>      if (is_before && is_after)
>          return ERR_PTR(-EINVAL);
>      ...
> }
>
> >
> > pw-bot: cr
> >
> >> +                       return ERR_PTR(-EINVAL);
> >> +       } else if (!hlist_empty(progs)) {
> >> +               /* flags cannot have both BPF_F_BEFORE and BPF_F_AFTER */
> >> +               if ((flags & BPF_F_BEFORE) && (flags & BPF_F_AFTER))
> >> +                       return ERR_PTR(-EINVAL);
> > do I understand correctly that neither BEFORE or AFTER might be set,
> > in which case it must be BPF_F_REPLACE, is that right? Can it happen
> > that we have neither REPLACE nor BEFORE/AFTER? Asked that below as
> > well...
>
> I think 'neither REPLACE nor BEFORE/AFTER' is possible. In that case,
> the prog is appended to the prog list.
>
> The code path here should not have REPLACE. See the code
>
>          if (pl) {
>                  old_prog = pl->prog;
>          } else {
>                  pl = kmalloc(sizeof(*pl), GFP_KERNEL);
>                  if (!pl) {
>                          bpf_cgroup_storages_free(new_storage);
>                          return -ENOMEM;
>                  }
>
>                  err = insert_pl_to_hlist(pl, progs, prog ? : link->link.prog, flags, id_or_fd);
>                  if (err) {
>                          kfree(pl);
>                          bpf_cgroup_storages_free(new_storage);
>                          return err;
>                  }
>          }
>
> If REPLACE is in the flag and prog replacement is successful, 'pl'
> will not be null.
>

ok, thanks

> >
> >> +       }
> >> +
> >> +       if (link) {
> >> +               anchor_link = bpf_get_anchor_link(flags, id_or_fd, prog->type);
> >> +               if (IS_ERR(anchor_link))
> >> +                       return ERR_PTR(PTR_ERR(anchor_link));
> >> +               anchor_prog = anchor_link->prog;
> >> +       } else if (id || id_or_fd) {
> >> +               anchor_prog = bpf_get_anchor_prog(flags, id_or_fd, prog->type);
> >> +               if (IS_ERR(anchor_prog))
> >> +                       return ERR_PTR(PTR_ERR(anchor_prog));
> >> +       }
> >> +
> >> +       if (!anchor_prog) {
> >> +               /* if there is no anchor_prog, then BPF_F_PREORDER doesn't matter
> >> +                * since either prepend or append to a combined list of progs will
> >> +                * end up with correct result.
> >> +                */
> >> +               hlist_for_each_entry(pltmp, progs, node) {
> >> +                       if (flags & BPF_F_BEFORE)
> >> +                               return pltmp;
> >> +                       if (pltmp->node.next)
> >> +                               continue;
> >> +                       return pltmp;
> >> +               }
> >> +               return NULL;
> >> +       }
> >> +
> >> +       hlist_for_each_entry(pltmp, progs, node) {
> >> +               pltmp_prog = pltmp->link ? pltmp->link->link.prog : pltmp->prog;
> >> +               if (pltmp_prog != anchor_prog)
> >> +                       continue;
> >> +               if (!!(pltmp->flags & BPF_F_PREORDER) != preorder)
> >> +                       goto out;
> > hm... thinking about this a bit more, is it illegal to have the same
> > BPF program attached as PREORDER and POSTORDER? That seems legit to
> > me, do we artificially disallow this?
>
> Good question, in find_attach_entry(), we have
>
>          hlist_for_each_entry(pl, progs, node) {
>                  if (prog && pl->prog == prog && prog != replace_prog)
>                          /* disallow attaching the same prog twice */
>                          return ERR_PTR(-EINVAL);
>                  if (link && pl->link == link)
>                          /* disallow attaching the same link twice */
>                          return ERR_PTR(-EINVAL);
>          }
>
> Basically, two same progs are not allowed. Here we didn't check PREORDER flag.
> Should we relax this for this patch set?

So with BPF link-based attachment we already allow multiple same BPF
programs to be attached, regardless of PREORDER. I don't think we need
to make any relaxations for old-style program attachment. But your
mprog API is, effectively ignoring link stuff and checking for
uniqueness of the program (regardless of whether it came from link or
not), which is problematic, I think, no?

[...]

> >> @@ -640,7 +765,8 @@ static struct bpf_prog_list *find_attach_entry(struct hlist_head *progs,
> >>   static int __cgroup_bpf_attach(struct cgroup *cgrp,
> >>                                 struct bpf_prog *prog, struct bpf_prog *replace_prog,
> >>                                 struct bpf_cgroup_link *link,
> >> -                              enum bpf_attach_type type, u32 flags)
> >> +                              enum bpf_attach_type type, u32 flags, u32 id_or_fd,
> >> +                              u64 revision)
> >>   {
> >>          u32 saved_flags = (flags & (BPF_F_ALLOW_OVERRIDE | BPF_F_ALLOW_MULTI));
> >>          struct bpf_prog *old_prog = NULL;
> >> @@ -656,6 +782,9 @@ static int __cgroup_bpf_attach(struct cgroup *cgrp,
> >>              ((flags & BPF_F_REPLACE) && !(flags & BPF_F_ALLOW_MULTI)))
> >>                  /* invalid combination */
> >>                  return -EINVAL;
> >> +       if ((flags & BPF_F_REPLACE) && (flags & (BPF_F_BEFORE | BPF_F_AFTER)))
> > but can it be that neither is set?
>
> I would say it is possible. In that case, the new prog is appended to
> the end of prog list.

ok, makes sense

>
> >
> >> +               /* only either replace or insertion with before/after */
> >> +               return -EINVAL;
> >>          if (link && (prog || replace_prog))
> >>                  /* only either link or prog/replace_prog can be specified */
> >>                  return -EINVAL;
> > [...]
>