Re: [RFC PATCH bpf-next 1/3] bpf: Introduce BPF_F_CPU flag for percpu_array map

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> · Wed, 2 Jul 2025 10:13:58 -0700

On Wed, Jul 2, 2025 at 10:02 AM Leon Hwang <leon.hwang@xxxxxxxxx> wrote:
>
>
>
> On 2025/7/2 04:22, Andrii Nakryiko wrote:
> > On Tue, Jun 24, 2025 at 9:54 AM Leon Hwang <leon.hwang@xxxxxxxxx> wrote:
> >>
> >> This patch introduces support for the BPF_F_CPU flag in percpu_array maps
> >> to allow updating or looking up values for specific CPUs or for all CPUs
> >> with a single value.
> >>
> >> This enhancement enables:
> >>
> >> * Efficient update of all CPUs using a single value when cpu == 0xFFFFFFFF.
> >> * Targeted update or lookup for a specific CPU otherwise.
> >>
> >> The flag is passed via:
> >>
> >> * map_flags in bpf_percpu_array_update() along with the cpu field.
> >> * elem_flags in generic_map_update_batch() along with the cpu field.
> >>
> >> Signed-off-by: Leon Hwang <leon.hwang@xxxxxxxxx>
> >> ---
> >>  include/linux/bpf.h            |  5 +--
> >>  include/uapi/linux/bpf.h       |  6 ++++
> >>  kernel/bpf/arraymap.c          | 46 ++++++++++++++++++++++++----
> >>  kernel/bpf/syscall.c           | 56 ++++++++++++++++++++++------------
> >>  tools/include/uapi/linux/bpf.h |  6 ++++
> >>  5 files changed, 92 insertions(+), 27 deletions(-)
> >>
> >
> > [...]
> >
> >> #define BPF_ALL_CPU    0xFFFFFFFF
> >
> > at the very least we have to make it an enum, IMO. but I'm in general
> > unsure if we need it at all... and in any case, should it be named
> > "BPF_ALL_CPUS" (plural)?
> >
>
> To avoid using such special value, would it be better to update value
> across all CPUs when the cpu equals to num_possible_cpus()?

no, I'd keep special pattern (it's unnecessary complication to figure
out num_possible_cpus value), it's just the question whether to add an
enum for it in the UAPI or just document (u32)~0 as special case

[...]

> > [...]
> >
> >> @@ -1941,19 +1941,27 @@ int generic_map_update_batch(struct bpf_map *map, struct file *map_file,
> >>  {
> >>         void __user *values = u64_to_user_ptr(attr->batch.values);
> >>         void __user *keys = u64_to_user_ptr(attr->batch.keys);
> >> +       u64 elem_flags = attr->batch.elem_flags;
> >>         u32 value_size, cp, max_count;
> >>         void *key, *value;
> >>         int err = 0;
> >>
> >> -       if (attr->batch.elem_flags & ~BPF_F_LOCK)
> >> +       if (elem_flags & ~(BPF_F_LOCK | BPF_F_CPU))
> >>                 return -EINVAL;
> >>
> >> -       if ((attr->batch.elem_flags & BPF_F_LOCK) &&
> >> +       if ((elem_flags & BPF_F_LOCK) &&
> >>             !btf_record_has_field(map->record, BPF_SPIN_LOCK)) {
> >>                 return -EINVAL;
> >>         }
> >>
> >> -       value_size = bpf_map_value_size(map);
> >> +       if (elem_flags & BPF_F_CPU) {
> >> +               if (map->map_type != BPF_MAP_TYPE_PERCPU_ARRAY)
> >> +                       return -EINVAL;
> >> +
> >> +               value_size = round_up(map->value_size, 8);
> >> +       } else {
> >> +               value_size = bpf_map_value_size(map);
> >> +       }
> >
> > why not roll this into bpf_map_value_size() helper? it's internal,
> > should be fine
> >
>
> It's to avoid updating value_size by pointer like
>
> err = bpf_map_value_size(map, elem_flags, &value_size);
>
> However, it's OK for me to do so.

if you need to communicate error, then return negative value_size? but
alternatively just do error checking before bpf_map_value_size

[...]