Re: [PATCH bpf-next v3 2/4] bpf, libbpf: Support global percpu data

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> · Wed, 28 May 2025 09:05:01 -0700

On Tue, May 27, 2025 at 7:35 PM Alexei Starovoitov
<alexei.starovoitov@xxxxxxxxx> wrote:
>
> On Tue, May 27, 2025 at 4:25 PM Andrii Nakryiko
> <andrii.nakryiko@xxxxxxxxx> wrote:
> >
> > On Tue, May 27, 2025 at 3:40 PM Alexei Starovoitov
> > <alexei.starovoitov@xxxxxxxxx> wrote:
> > >
> > > On Mon, May 26, 2025 at 9:22 AM Leon Hwang <leon.hwang@xxxxxxxxx> wrote:
> > > > +
> > > > +       data_sz = map->def.value_size;
> > > > +       if (is_percpu) {
> > > > +               num_cpus = libbpf_num_possible_cpus();
> > > > +               if (num_cpus < 0) {
> > > > +                       err = num_cpus;
> > > > +                       return err;
> > > > +               }
> > > > +
> > > > +               data_sz = data_sz * num_cpus;
> > > > +               data = malloc(data_sz);
> > > > +               if (!data) {
> > > > +                       err = -ENOMEM;
> > > > +                       return err;
> > > > +               }
> > > > +
> > > > +               elem_sz = map->def.value_size;
> > > > +               for (i = 0; i < num_cpus; i++)
> > > > +                       memcpy(data + i * elem_sz, map->mmaped, elem_sz);
> > > > +       } else {
> > > > +               data = map->mmaped;
> > > > +       }
> > > >
> > > >         if (obj->gen_loader) {
> > > >                 bpf_gen__map_update_elem(obj->gen_loader, map - obj->maps,
> > > > -                                        map->mmaped, map->def.value_size);
> > > > +                                        data, data_sz);
> > >
> > > I missed it earlier, but now I wonder how this is supposed to work ?
> > > skel and lskel may be generated on a system with N cpus,
> > > but loaded with M cpus.
> > >
> > > Another concern is num_cpus multiplier can be huge.
> > > lksel adds all that init data into a global array.
> > > Pls avoid this multiplier.
> >
> > Hm... For skel, the number of CPUs at runtime isn't a problem, it's
> > only memory waste for this temporary data. But it is forced on us by
> > kernel contract for MAP_UPDATE_ELEM for per-CPU maps.
> >
> > Should we have a flag for map update command for per-CPU maps that
> > would mean "use this data as a value for each CPU"? Then we can
> > provide just a small piece of initialization data and not have to rely
> > on the number of CPUs. This will also make lskel part very simple.
>
> Initially it felt too specific, but I think it makes sense.
> The contract was too restrictive. Let's add the flag.
>
> > Alternatively (and perhaps more flexibly) we can extend
> > MAP_UPDATE_ELEM with ability to specify specific CPU for per-CPU maps.
> > I'd probably have a MAP_LOOKUP_ELEM counterpart for this as well. Then
> > skeleton/light skeleton code can iterate given number of times to
> > initialize all CPUs using small initial data image.
>
> I guess this can be a follow up.
> With extra flag lookup/update/delete can look into a new field
> in that anonymous struct:
>         struct { /* anonymous struct used by BPF_MAP_*_ELEM and
> BPF_MAP_FREEZE commands */
>                 __u32           map_fd;
>                 __aligned_u64   key;
>                 union {
>                         __aligned_u64 value;
>                         __aligned_u64 next_key;
>                 };
>                 __u64           flags;
>         };
>

Yep, we'd have two flags: one for "apply across all CPUs", and another
meaning "apply for specified CPU" + new CPU number field. Or the same
flag with a special CPU number value (0xffffffff?).

> There is also "batch" version of lookup/update/delete.
> They probably will need to be extended as well for consistency ?
> So I'd only go with the "use data to update all CPUs" flag for now.

Agreed. But also looking at generic_map_update_batch() it seems like
it just routes everything through single-element updates, so it
shouldn't be hard to add batch support for all this either.