Re: [PATCH bpf-next v3 2/4] bpf, libbpf: Support global percpu data

Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> · Tue, 27 May 2025 19:35:01 -0700

On Tue, May 27, 2025 at 4:25 PM Andrii Nakryiko
<andrii.nakryiko@xxxxxxxxx> wrote:
>
> On Tue, May 27, 2025 at 3:40 PM Alexei Starovoitov
> <alexei.starovoitov@xxxxxxxxx> wrote:
> >
> > On Mon, May 26, 2025 at 9:22 AM Leon Hwang <leon.hwang@xxxxxxxxx> wrote:
> > > +
> > > +       data_sz = map->def.value_size;
> > > +       if (is_percpu) {
> > > +               num_cpus = libbpf_num_possible_cpus();
> > > +               if (num_cpus < 0) {
> > > +                       err = num_cpus;
> > > +                       return err;
> > > +               }
> > > +
> > > +               data_sz = data_sz * num_cpus;
> > > +               data = malloc(data_sz);
> > > +               if (!data) {
> > > +                       err = -ENOMEM;
> > > +                       return err;
> > > +               }
> > > +
> > > +               elem_sz = map->def.value_size;
> > > +               for (i = 0; i < num_cpus; i++)
> > > +                       memcpy(data + i * elem_sz, map->mmaped, elem_sz);
> > > +       } else {
> > > +               data = map->mmaped;
> > > +       }
> > >
> > >         if (obj->gen_loader) {
> > >                 bpf_gen__map_update_elem(obj->gen_loader, map - obj->maps,
> > > -                                        map->mmaped, map->def.value_size);
> > > +                                        data, data_sz);
> >
> > I missed it earlier, but now I wonder how this is supposed to work ?
> > skel and lskel may be generated on a system with N cpus,
> > but loaded with M cpus.
> >
> > Another concern is num_cpus multiplier can be huge.
> > lksel adds all that init data into a global array.
> > Pls avoid this multiplier.
>
> Hm... For skel, the number of CPUs at runtime isn't a problem, it's
> only memory waste for this temporary data. But it is forced on us by
> kernel contract for MAP_UPDATE_ELEM for per-CPU maps.
>
> Should we have a flag for map update command for per-CPU maps that
> would mean "use this data as a value for each CPU"? Then we can
> provide just a small piece of initialization data and not have to rely
> on the number of CPUs. This will also make lskel part very simple.

Initially it felt too specific, but I think it makes sense.
The contract was too restrictive. Let's add the flag.

> Alternatively (and perhaps more flexibly) we can extend
> MAP_UPDATE_ELEM with ability to specify specific CPU for per-CPU maps.
> I'd probably have a MAP_LOOKUP_ELEM counterpart for this as well. Then
> skeleton/light skeleton code can iterate given number of times to
> initialize all CPUs using small initial data image.

I guess this can be a follow up.
With extra flag lookup/update/delete can look into a new field
in that anonymous struct:
        struct { /* anonymous struct used by BPF_MAP_*_ELEM and
BPF_MAP_FREEZE commands */
                __u32           map_fd;
                __aligned_u64   key;
                union {
                        __aligned_u64 value;
                        __aligned_u64 next_key;
                };
                __u64           flags;
        };

There is also "batch" version of lookup/update/delete.
They probably will need to be extended as well for consistency ?
So I'd only go with the "use data to update all CPUs" flag for now.