Hi Linus, The following changes since commit 2014c95afecee3e76ca4a56956a936e23283f05b: Linux 6.14-rc1 (2025-02-02 15:39:26 -0800) are available in the Git repository at: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git tags/bpf_try_alloc_pages for you to fetch changes up to f90b474a35744b5d43009e4fab232e74a3024cae: mm: Fix the flipped condition in gfpflags_allow_spinning() (2025-03-15 11:18:19 -0700) ---------------------------------------------------------------- Please pull after main MM changes. The pull includes work from Sebastian, Vlastimil and myself with a lot of help from Michal and Shakeel. This is a first step towards making kmalloc reentrant to get rid of slab wrappers: bpf_mem_alloc, kretprobe's objpool, etc. These patches make page allocator safe from any context. Vlastimil kicked off this effort at LSFMM 2024: https://lwn.net/Articles/974138/ and we continued at LSFMM 2025: https://lore.kernel.org/all/CAADnVQKfkGxudNUkcPJgwe3nTZ=xohnRshx9kLZBTmR_E1DFEg@xxxxxxxxxxxxxx/ Why: SLAB wrappers bind memory to a particular subsystem making it unavailable to the rest of the kernel. Some BPF maps in production consume Gbytes of preallocated memory. Top 5 in Meta: 1.5G, 1.2G, 1.1G, 300M, 200M. Once we have kmalloc that works in any context BPF map preallocation won't be necessary. How: Synchronous kmalloc/page alloc stack has multiple stages going from fast to slow: cmpxchg16 -> slab_alloc -> new_slab -> alloc_pages -> rmqueue_pcplist -> __rmqueue. rmqueue_pcplist was already relying on trylock. This set changes rmqueue_bulk/rmqueue_buddy to attempt a trylock and return ENOMEM if alloc_flags & ALLOC_TRYLOCK. Then it wraps this functionality into try_alloc_pages() helper. We make sure that the logic is sane in PREEMPT_RT. End result: try_alloc_pages()/free_pages_nolock() are safe to call from any context. try_kmalloc() for any context with similar trylock approach will follow. It will use try_alloc_pages() when slab needs a new page. Though such try_kmalloc/page_alloc() is an opportunistic allocator, this design ensures that the probability of successful allocation of small objects (up to one page in size) is high. Even before we have try_kmalloc(), we already use try_alloc_pages() in BPF arena implementation and it's going to be used more extensively in BPF. Once the set was applied to bpf-next we ran into two two small conflicts with MM tree as reported by Stephen: https://lore.kernel.org/bpf/20250311120422.1d9a8f80@xxxxxxxxxxxxxxxx/ https://lore.kernel.org/bpf/20250312145247.380c2aa5@xxxxxxxxxxxxxxxx/ So Andrew suggested to keep thing as-is instead of moving patchset between the trees before merge window: https://lore.kernel.org/all/20250317132710.fbcde1c8bb66f91f36e78c89@xxxxxxxxxxxxxxxxxxxx/ Note "locking/local_lock: Introduce localtry_lock_t" patch is later used in Vlastimil's sheaves and in Shakeel's changes. Signed-off-by: Alexei Starovoitov <ast@xxxxxxxxxx> ---------------------------------------------------------------- Alexei Starovoitov (6): mm, bpf: Introduce try_alloc_pages() for opportunistic page allocation mm, bpf: Introduce free_pages_nolock() memcg: Use trylock to access memcg stock_lock. mm, bpf: Use memcg in try_alloc_pages(). bpf: Use try_alloc_pages() to allocate pages for bpf needs. Merge branch 'bpf-mm-introduce-try_alloc_pages' Sebastian Andrzej Siewior (1): locking/local_lock: Introduce localtry_lock_t Vlastimil Babka (1): mm: Fix the flipped condition in gfpflags_allow_spinning() include/linux/bpf.h | 2 +- include/linux/gfp.h | 23 ++++ include/linux/local_lock.h | 70 +++++++++++++ include/linux/local_lock_internal.h | 146 ++++++++++++++++++++++++++ include/linux/mm_types.h | 4 + include/linux/mmzone.h | 3 + kernel/bpf/arena.c | 5 +- kernel/bpf/syscall.c | 23 +++- lib/stackdepot.c | 10 +- mm/internal.h | 1 + mm/memcontrol.c | 53 +++++++--- mm/page_alloc.c | 203 +++++++++++++++++++++++++++++++++--- mm/page_owner.c | 8 +- 13 files changed, 509 insertions(+), 42 deletions(-)