On Tue, May 27, 2025 at 4:25 PM Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> wrote: > > On Tue, May 27, 2025 at 3:40 PM Alexei Starovoitov > <alexei.starovoitov@xxxxxxxxx> wrote: > > > > On Mon, May 26, 2025 at 9:22 AM Leon Hwang <leon.hwang@xxxxxxxxx> wrote: > > > + > > > + data_sz = map->def.value_size; > > > + if (is_percpu) { > > > + num_cpus = libbpf_num_possible_cpus(); > > > + if (num_cpus < 0) { > > > + err = num_cpus; > > > + return err; > > > + } > > > + > > > + data_sz = data_sz * num_cpus; > > > + data = malloc(data_sz); > > > + if (!data) { > > > + err = -ENOMEM; > > > + return err; > > > + } > > > + > > > + elem_sz = map->def.value_size; > > > + for (i = 0; i < num_cpus; i++) > > > + memcpy(data + i * elem_sz, map->mmaped, elem_sz); > > > + } else { > > > + data = map->mmaped; > > > + } > > > > > > if (obj->gen_loader) { > > > bpf_gen__map_update_elem(obj->gen_loader, map - obj->maps, > > > - map->mmaped, map->def.value_size); > > > + data, data_sz); > > > > I missed it earlier, but now I wonder how this is supposed to work ? > > skel and lskel may be generated on a system with N cpus, > > but loaded with M cpus. > > > > Another concern is num_cpus multiplier can be huge. > > lksel adds all that init data into a global array. > > Pls avoid this multiplier. > > Hm... For skel, the number of CPUs at runtime isn't a problem, it's > only memory waste for this temporary data. But it is forced on us by > kernel contract for MAP_UPDATE_ELEM for per-CPU maps. > > Should we have a flag for map update command for per-CPU maps that > would mean "use this data as a value for each CPU"? Then we can > provide just a small piece of initialization data and not have to rely > on the number of CPUs. This will also make lskel part very simple. Initially it felt too specific, but I think it makes sense. The contract was too restrictive. Let's add the flag. > Alternatively (and perhaps more flexibly) we can extend > MAP_UPDATE_ELEM with ability to specify specific CPU for per-CPU maps. > I'd probably have a MAP_LOOKUP_ELEM counterpart for this as well. Then > skeleton/light skeleton code can iterate given number of times to > initialize all CPUs using small initial data image. I guess this can be a follow up. With extra flag lookup/update/delete can look into a new field in that anonymous struct: struct { /* anonymous struct used by BPF_MAP_*_ELEM and BPF_MAP_FREEZE commands */ __u32 map_fd; __aligned_u64 key; union { __aligned_u64 value; __aligned_u64 next_key; }; __u64 flags; }; There is also "batch" version of lookup/update/delete. They probably will need to be extended as well for consistency ? So I'd only go with the "use data to update all CPUs" flag for now.