Alexei Starovoitov wrote: > On Wed, Jun 18, 2025 at 6:50 AM Anton Protopopov > <a.s.protopopov@xxxxxxxxx> wrote: > > > > On 25/06/16 10:38AM, Willem de Bruijn wrote: > > > From: Willem de Bruijn <willemb@xxxxxxxxxx> > > > > > > BPF_MAP_TYPE_LRU_HASH can recycle most recent elements well before the > > > map is full, due to percpu reservations and force shrink before > > > neighbor stealing. Once a CPU is unable to borrow from the global map, > > > it will once steal one elem from a neighbor and after that each time > > > flush this one element to the global list and immediately recycle it. > > > > > > Batch value LOCAL_FREE_TARGET (128) will exhaust a 10K element map > > > with 79 CPUs. CPU 79 will observe this behavior even while its > > > neighbors hold 78 * 127 + 1 * 15 == 9921 free elements (99%). > > > > > > CPUs need not be active concurrently. The issue can appear with > > > affinity migration, e.g., irqbalance. Each CPU can reserve and then > > > hold onto its 128 elements indefinitely. > > > > > > Avoid global list exhaustion by limiting aggregate percpu caches to > > > half of map size, by adjusting LOCAL_FREE_TARGET based on cpu count. > > > This change has no effect on sufficiently large tables. > > > > > > Similar to LOCAL_NR_SCANS and lru->nr_scans, introduce a map variable > > > lru->free_target. The extra field fits in a hole in struct bpf_lru. > > > The cacheline is already warm where read in the hot path. The field is > > > only accessed with the lru lock held. > > > > Hi Willem! The patch looks very reasonable. I've bumbed into this > > issue before (see https://lore.kernel.org/bpf/ZJwy478jHkxYNVMc@zh-lab-node-5/) > > but didn't follow up, as we typically have large enough LRU maps. > > > > I've tested your patch (with a patched map_tests/map_percpu_stats.c > > selftest), works as expected for small maps. E.g., before your patch > > map of size 4096 after being updated 2176 times from 32 threads on 32 > > CPUS contains around 150 elements, after your patch around (expected) > > 2100 elements. > > > > Tested-by: Anton Protopopov <a.s.protopopov@xxxxxxxxx> > > Looks like we have consensus. Great. Thanks for the reviews and testing. Good to have more data that the issue is well understood and the approach helps. > Willem, > please target bpf tree when you respin. Done: https://lore.kernel.org/bpf/20250618215803.3587312-1-willemdebruijn.kernel@xxxxxxxxx/T/#u