This patch set introduces the BPF_F_CPU and BPF_F_ALL_CPUS flags for percpu maps, as the requirement of BPF_F_ALL_CPUS flag for percpu_array maps was discussed in the thread of "[PATCH bpf-next v3 0/4] bpf: Introduce global percpu data"[1]. The goal of BPF_F_ALL_CPUS flag is to reduce data caching overhead in light skeletons by allowing a single value to be reused to update values across all CPUs. This avoids the M:N problem where M cached values are used to update a map on N CPUs kernel. The BPF_F_CPU flag is accompanied by *flags*-embedded cpu info, which specifies the target CPU for the operation: * For lookup operations: the flag field alongside cpu info enable querying a value on the specified CPU. * For update operations: the flag field alongside cpu info enable updating value for specified CPU. Links: [1] https://lore.kernel.org/bpf/20250526162146.24429-1-leon.hwang@xxxxxxxxx/ Changes: v6 -> v7: * Get correct value size for percpu_hash and lru_percpu_hash in update_batch API. * Set 'count' as 'max_entries' in test cases for lookup_batch API. * Address comment from Alexei: * Move cpu flags check into bpf_map_check_op_flags(). v5 -> v6: * Move bpf_map_check_op_flags() from 'bpf.h' to 'syscall.c'. * Address comments from Alexei: * Drop the refactoring code of data copying logic for percpu maps. * Drop bpf_map_check_op_flags() wrappers. v4 -> v5: * Address comments from Andrii: * Refactor data copying logic for all percpu maps. * Drop this_cpu_ptr() micro-optimization. * Drop cpu check in libbpf's validate_map_op(). * Enhance bpf_map_check_op_flags() using *allowed flags* instead of 'extra_flags_mask'. v3 -> v4: * Address comments from Andrii: * Remove unnecessary map_type check in bpf_map_value_size(). * Reduce code churn. * Remove unnecessary do_delete check in __htab_map_lookup_and_delete_batch(). * Introduce bpf_percpu_copy_to_user() and bpf_percpu_copy_from_user(). * Rename check_map_flags() to bpf_map_check_op_flags() with extra_flags_mask. * Add human-readable pr_warn() explanations in validate_map_op(). * Use flags in bpf_map__delete_elem() and bpf_map__lookup_and_delete_elem(). * Drop "for alignment reasons". v3 link: https://lore.kernel.org/bpf/20250821160817.70285-1-leon.hwang@xxxxxxxxx/ v2 -> v3: * Address comments from Alexei: * Use BPF_F_ALL_CPUS instead of BPF_ALL_CPUS magic. * Introduce these two cpu flags for all percpu maps. * Address comments from Jiri: * Reduce some unnecessary u32 cast. * Refactor more generic map flags check function. * A code style issue. v2 link: https://lore.kernel.org/bpf/20250805163017.17015-1-leon.hwang@xxxxxxxxx/ v1 -> v2: * Address comments from Andrii: * Embed cpu info as high 32 bits of *flags* totally. * Use ERANGE instead of E2BIG. * Few format issues. Leon Hwang (7): bpf: Introduce internal bpf_map_check_op_flags helper function bpf: Introduce BPF_F_CPU and BPF_F_ALL_CPUS flags bpf: Add BPF_F_CPU and BPF_F_ALL_CPUS flags support for percpu_array maps bpf: Add BPF_F_CPU and BPF_F_ALL_CPUS flags support for percpu_hash and lru_percpu_hash maps bpf: Add BPF_F_CPU and BPF_F_ALL_CPUS flags support for percpu_cgroup_storage maps libbpf: Add BPF_F_CPU and BPF_F_ALL_CPUS flags support for percpu maps selftests/bpf: Add cases to test BPF_F_CPU and BPF_F_ALL_CPUS flags include/linux/bpf-cgroup.h | 4 +- include/linux/bpf.h | 44 +++- include/uapi/linux/bpf.h | 2 + kernel/bpf/arraymap.c | 24 +- kernel/bpf/hashtab.c | 77 ++++-- kernel/bpf/local_storage.c | 22 +- kernel/bpf/syscall.c | 65 +++-- tools/include/uapi/linux/bpf.h | 2 + tools/lib/bpf/bpf.h | 8 + tools/lib/bpf/libbpf.c | 26 +- tools/lib/bpf/libbpf.h | 21 +- .../selftests/bpf/prog_tests/percpu_alloc.c | 233 ++++++++++++++++++ .../selftests/bpf/progs/percpu_alloc_array.c | 32 +++ 13 files changed, 471 insertions(+), 89 deletions(-) -- 2.50.1