[PATCH slab v5 0/6] slab: Re-entrant kmalloc_nolock()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



From: Alexei Starovoitov <ast@xxxxxxxxxx>

Overview:

This patch set introduces kmalloc_nolock() which is the next logical
step towards any context allocation necessary to remove bpf_mem_alloc
and get rid of preallocation requirement in BPF infrastructure.
In production BPF maps grew to gigabytes in size. Preallocation wastes
memory. Alloc from any context addresses this issue for BPF and
other subsystems that are forced to preallocate too.
This long task started with introduction of alloc_pages_nolock(),
then memcg and objcg were converted to operate from any context
including NMI, this set completes the task with kmalloc_nolock()
that builds on top of alloc_pages_nolock() and memcg changes.
After that BPF subsystem will gradually adopt it everywhere.

The patch set is on top of slab/for-next that already has
pre-patch "locking/local_lock: Expose dep_map in local_trylock_t." applied.
I think the patch set should be routed via vbabka/slab.git.

v4->v5:
- New patch "Reuse first bit for OBJEXTS_ALLOC_FAIL" to free up a bit
  and use it to mark slabobj_ext vector allocated with kmalloc_nolock(),
  so that freeing of the vector can be done with kfree_nolock()
- Call kasan_slab_free() directly from kfree_nolock() instead of deferring to
  do_slab_free() to avoid double poisoning
- Addressed other minor issues spotted by Harry

v4:
https://lore.kernel.org/all/20250718021646.73353-1-alexei.starovoitov@xxxxxxxxx/

v3->v4:
- Converted local_lock_cpu_slab() to macro
- Reordered patches 5 and 6
- Emphasized that kfree_nolock() shouldn't be used on kmalloc()-ed objects
- Addressed other comments and improved commit logs
- Fixed build issues reported by bots

v3:
https://lore.kernel.org/bpf/20250716022950.69330-1-alexei.starovoitov@xxxxxxxxx/

v2->v3:
- Adopted Sebastian's local_lock_cpu_slab(), but dropped gfpflags
  to avoid extra branch for performance reasons,
  and added local_unlock_cpu_slab() for symmetry.
- Dropped local_lock_lockdep_start/end() pair and switched to
  per kmem_cache lockdep class on PREEMPT_RT to silence false positive
  when the same cpu/task acquires two local_lock-s.
- Refactorred defer_free per Sebastian's suggestion
- Fixed slab leak when it needs to be deactivated via irq_work and llist
  as Vlastimil proposed. Including defer_free_barrier().
- Use kmem_cache->offset for llist_node pointer when linking objects
  instead of zero offset, since whole object could be used for slabs
  with ctors and other cases.
- Fixed "cnt = 1; goto redo;" issue.
- Fixed slab leak in alloc_single_from_new_slab().
- Retested with slab_debug, RT, !RT, lockdep, kasan, slab_tiny
- Added acks to patches 1-4 that should be good to go.

v2:
https://lore.kernel.org/bpf/20250709015303.8107-1-alexei.starovoitov@xxxxxxxxx/

v1->v2:
Added more comments for this non-trivial logic and addressed earlier comments.
In particular:
- Introduce alloc_frozen_pages_nolock() to avoid refcnt race
- alloc_pages_nolock() defaults to GFP_COMP
- Support SLUB_TINY
- Added more variants to stress tester to discover that kfree_nolock() can
  OOM, because deferred per-slab llist won't be serviced if kfree_nolock()
  gets unlucky long enough. Scraped previous approach and switched to
  global per-cpu llist with immediate irq_work_queue() to process all
  object sizes.
- Reentrant kmalloc cannot deactivate_slab(). In v1 the node hint was
  downgraded to NUMA_NO_NODE before calling slab_alloc(). Realized it's not
  good enough. There are odd cases that can trigger deactivate. Rewrote
  this part.
- Struggled with SLAB_NO_CMPXCHG. Thankfully Harry had a great suggestion:
  https://lore.kernel.org/bpf/aFvfr1KiNrLofavW@hyeyoo/
  which was adopted. So slab_debug works now.
- In v1 I had to s/local_lock_irqsave/local_lock_irqsave_check/ in a bunch
  of places in mm/slub.c to avoid lockdep false positives.
  Came up with much cleaner approach to silence invalid lockdep reports
  without sacrificing lockdep coverage. See local_lock_lockdep_start/end().

v1:
https://lore.kernel.org/bpf/20250501032718.65476-1-alexei.starovoitov@xxxxxxxxx/

Alexei Starovoitov (6):
  locking/local_lock: Introduce local_lock_is_locked().
  mm: Allow GFP_ACCOUNT to be used in alloc_pages_nolock().
  mm: Introduce alloc_frozen_pages_nolock()
  slab: Make slub local_(try)lock more precise for LOCKDEP
  slab: Reuse first bit for OBJEXTS_ALLOC_FAIL
  slab: Introduce kmalloc_nolock() and kfree_nolock().

 include/linux/gfp.h                 |   2 +-
 include/linux/kasan.h               |  13 +-
 include/linux/local_lock.h          |   2 +
 include/linux/local_lock_internal.h |   7 +
 include/linux/memcontrol.h          |  12 +-
 include/linux/rtmutex.h             |  10 +
 include/linux/slab.h                |   4 +
 kernel/bpf/stream.c                 |   2 +-
 kernel/bpf/syscall.c                |   2 +-
 kernel/locking/rtmutex_common.h     |   9 -
 mm/Kconfig                          |   1 +
 mm/internal.h                       |   4 +
 mm/kasan/common.c                   |   5 +-
 mm/page_alloc.c                     |  55 ++--
 mm/slab.h                           |   7 +
 mm/slab_common.c                    |   3 +
 mm/slub.c                           | 495 +++++++++++++++++++++++++---
 17 files changed, 541 insertions(+), 92 deletions(-)

-- 
2.47.3





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux