[GIT PULL] Introduce try_alloc_pages for 6.15

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Linus,

The following changes since commit 2014c95afecee3e76ca4a56956a936e23283f05b:

  Linux 6.14-rc1 (2025-02-02 15:39:26 -0800)

are available in the Git repository at:

  https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git tags/bpf_try_alloc_pages

for you to fetch changes up to f90b474a35744b5d43009e4fab232e74a3024cae:

  mm: Fix the flipped condition in gfpflags_allow_spinning() (2025-03-15 11:18:19 -0700)

----------------------------------------------------------------
Please pull after main MM changes.

The pull includes work from Sebastian, Vlastimil and myself
with a lot of help from Michal and Shakeel.
This is a first step towards making kmalloc reentrant to get rid
of slab wrappers: bpf_mem_alloc, kretprobe's objpool, etc.
These patches make page allocator safe from any context.

Vlastimil kicked off this effort at LSFMM 2024:
https://lwn.net/Articles/974138/
and we continued at LSFMM 2025:
https://lore.kernel.org/all/CAADnVQKfkGxudNUkcPJgwe3nTZ=xohnRshx9kLZBTmR_E1DFEg@xxxxxxxxxxxxxx/

Why:
SLAB wrappers bind memory to a particular subsystem
making it unavailable to the rest of the kernel.
Some BPF maps in production consume Gbytes of preallocated
memory. Top 5 in Meta: 1.5G, 1.2G, 1.1G, 300M, 200M.
Once we have kmalloc that works in any context BPF map
preallocation won't be necessary.

How:
Synchronous kmalloc/page alloc stack has multiple
stages going from fast to slow:
cmpxchg16 -> slab_alloc -> new_slab -> alloc_pages ->
rmqueue_pcplist -> __rmqueue.
rmqueue_pcplist was already relying on trylock.
This set changes rmqueue_bulk/rmqueue_buddy to attempt
a trylock and return ENOMEM if alloc_flags & ALLOC_TRYLOCK.
Then it wraps this functionality into try_alloc_pages() helper.
We make sure that the logic is sane in PREEMPT_RT.
End result: try_alloc_pages()/free_pages_nolock() are
safe to call from any context.
try_kmalloc() for any context with similar trylock
approach will follow. It will use try_alloc_pages()
when slab needs a new page.
Though such try_kmalloc/page_alloc() is an opportunistic
allocator, this design ensures that the probability of
successful allocation of small objects (up to one
page in size) is high.

Even before we have try_kmalloc(), we already use
try_alloc_pages() in BPF arena implementation and it's
going to be used more extensively in BPF.

Once the set was applied to bpf-next we ran into two
two small conflicts with MM tree as reported by Stephen:
https://lore.kernel.org/bpf/20250311120422.1d9a8f80@xxxxxxxxxxxxxxxx/
https://lore.kernel.org/bpf/20250312145247.380c2aa5@xxxxxxxxxxxxxxxx/

So Andrew suggested to keep thing as-is instead of moving
patchset between the trees before merge window:
https://lore.kernel.org/all/20250317132710.fbcde1c8bb66f91f36e78c89@xxxxxxxxxxxxxxxxxxxx/

Note "locking/local_lock: Introduce localtry_lock_t" patch is
later used in Vlastimil's sheaves and in Shakeel's changes.

Signed-off-by: Alexei Starovoitov <ast@xxxxxxxxxx>
----------------------------------------------------------------
Alexei Starovoitov (6):
      mm, bpf: Introduce try_alloc_pages() for opportunistic page allocation
      mm, bpf: Introduce free_pages_nolock()
      memcg: Use trylock to access memcg stock_lock.
      mm, bpf: Use memcg in try_alloc_pages().
      bpf: Use try_alloc_pages() to allocate pages for bpf needs.
      Merge branch 'bpf-mm-introduce-try_alloc_pages'

Sebastian Andrzej Siewior (1):
      locking/local_lock: Introduce localtry_lock_t

Vlastimil Babka (1):
      mm: Fix the flipped condition in gfpflags_allow_spinning()

 include/linux/bpf.h                 |   2 +-
 include/linux/gfp.h                 |  23 ++++
 include/linux/local_lock.h          |  70 +++++++++++++
 include/linux/local_lock_internal.h | 146 ++++++++++++++++++++++++++
 include/linux/mm_types.h            |   4 +
 include/linux/mmzone.h              |   3 +
 kernel/bpf/arena.c                  |   5 +-
 kernel/bpf/syscall.c                |  23 +++-
 lib/stackdepot.c                    |  10 +-
 mm/internal.h                       |   1 +
 mm/memcontrol.c                     |  53 +++++++---
 mm/page_alloc.c                     | 203 +++++++++++++++++++++++++++++++++---
 mm/page_owner.c                     |   8 +-
 13 files changed, 509 insertions(+), 42 deletions(-)




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux