Re: [PATCH bpf-next v3 5/6] libbpf: Support BPF_F_CPU for percpu maps

"Leon Hwang" <leon.hwang@xxxxxxxxx> · Tue, 26 Aug 2025 23:35:27 +0800

On Sat Aug 23, 2025 at 6:20 AM +08, Andrii Nakryiko wrote:
> On Thu, Aug 21, 2025 at 9:09 AM Leon Hwang <leon.hwang@xxxxxxxxx> wrote:
>>

[...]

>> @@ -10630,6 +10630,19 @@ static int validate_map_op(const struct bpf_map *map, size_t key_sz,
>>                 int num_cpu = libbpf_num_possible_cpus();
>>                 size_t elem_sz = roundup(map->def.value_size, 8);
>>
>> +               if (flags & (BPF_F_CPU | BPF_F_ALL_CPUS)) {
>> +                       if ((flags & BPF_F_CPU) && (flags & BPF_F_ALL_CPUS))
>> +                               return -EINVAL;
>> +                       if ((flags >> 32) >= num_cpu)
>> +                               return -ERANGE;
>
> The idea of validate_map_op() is to make it easier for users to
> understand what's wrong with how they deal with the map, rather than
> just getting indiscriminate -EINVAL from the kernel.
>
> Point being: add human-readable pr_warn() explanations for all the new
> conditions you are detecting, otherwise it's just meaningless.
>

Ack.

I'll add these pr_warn() explanations in next revision.

>> +                       if (value_sz != elem_sz) {
>> +                               pr_warn("map '%s': unexpected value size %zu provided for per-CPU map, expected %zu\n",
>> +                                       map->name, value_sz, elem_sz);
>> +                               return -EINVAL;
>> +                       }
>> +                       break;
>> +               }
>> +
>>                 if (value_sz != num_cpu * elem_sz) {
>>                         pr_warn("map '%s': unexpected value size %zu provided for per-CPU map, expected %d * %zu = %zd\n",
>>                                 map->name, value_sz, num_cpu, elem_sz, num_cpu * elem_sz);
>> @@ -10654,7 +10667,7 @@ int bpf_map__lookup_elem(const struct bpf_map *map,
>>  {
>>         int err;
>>
>> -       err = validate_map_op(map, key_sz, value_sz, true);
>> +       err = validate_map_op(map, key_sz, value_sz, true, flags);
>>         if (err)
>>                 return libbpf_err(err);
>>
>> @@ -10667,7 +10680,7 @@ int bpf_map__update_elem(const struct bpf_map *map,
>>  {
>>         int err;
>>
>> -       err = validate_map_op(map, key_sz, value_sz, true);
>> +       err = validate_map_op(map, key_sz, value_sz, true, flags);
>>         if (err)
>>                 return libbpf_err(err);
>>
>> @@ -10679,7 +10692,7 @@ int bpf_map__delete_elem(const struct bpf_map *map,
>>  {
>>         int err;
>>
>> -       err = validate_map_op(map, key_sz, 0, false /* check_value_sz */);
>> +       err = validate_map_op(map, key_sz, 0, false /* check_value_sz */, 0);
>
> hard-coded 0 instead of flags, why?
>

It should be flags.

However, delete op does not support the introduced cpu flags.

I think it's OK to use 0 here.

>>         if (err)
>>                 return libbpf_err(err);
>>
>> @@ -10692,7 +10705,7 @@ int bpf_map__lookup_and_delete_elem(const struct bpf_map *map,
>>  {
>>         int err;
>>
>> -       err = validate_map_op(map, key_sz, value_sz, true);
>> +       err = validate_map_op(map, key_sz, value_sz, true, 0);
>
> same about flags
>

Ack.

>>         if (err)
>>                 return libbpf_err(err);
>>
>> @@ -10704,7 +10717,7 @@ int bpf_map__get_next_key(const struct bpf_map *map,
>>  {
>>         int err;
>>
>> -       err = validate_map_op(map, key_sz, 0, false /* check_value_sz */);
>> +       err = validate_map_op(map, key_sz, 0, false /* check_value_sz */, 0);
>>         if (err)
>>                 return libbpf_err(err);
>>
>> diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
>> index 2e91148d9b44d..6a972a8d060c3 100644
>> --- a/tools/lib/bpf/libbpf.h
>> +++ b/tools/lib/bpf/libbpf.h
>> @@ -1196,12 +1196,13 @@ LIBBPF_API struct bpf_map *bpf_map__inner_map(struct bpf_map *map);
>>   * @param key_sz size in bytes of key data, needs to match BPF map definition's **key_size**
>>   * @param value pointer to memory in which looked up value will be stored
>>   * @param value_sz size in byte of value data memory; it has to match BPF map
>> - * definition's **value_size**. For per-CPU BPF maps value size has to be
>> - * a product of BPF map value size and number of possible CPUs in the system
>> - * (could be fetched with **libbpf_num_possible_cpus()**). Note also that for
>> - * per-CPU values value size has to be aligned up to closest 8 bytes for
>> - * alignment reasons, so expected size is: `round_up(value_size, 8)
>> - * * libbpf_num_possible_cpus()`.
>> + * definition's **value_size**. For per-CPU BPF maps, value size can be
>> + * definition's **value_size** if **BPF_F_CPU** or **BPF_F_ALL_CPUS** is
>> + * specified in **flags**, otherwise a product of BPF map value size and number
>> + * of possible CPUs in the system (could be fetched with
>> + * **libbpf_num_possible_cpus()**). Note else that for per-CPU values value
>> + * size has to be aligned up to closest 8 bytes for alignment reasons, so
>
> nit: aligned up for alignment reasons... drop "for alignment reasons", I guess?
>

It is "for alignment reasons", because percpu maps use bpf_long_memcpy()
to copy data.

static inline void bpf_long_memcpy(void *dst, const void *src, u32 size)
{
        const long *lsrc = src;
        long *ldst = dst;

        size /= sizeof(long);
        while (size--)
                data_race(*ldst++ = *lsrc++);
}

Thanks,
Leon

[...]