From: Alexei Starovoitov <ast@xxxxxxxxxx> Overview: This patch set introduces kmalloc_nolock() which is the next logical step towards any context allocation necessary to remove bpf_mem_alloc and get rid of preallocation requirement in BPF infrastructure. In production BPF maps grew to gigabytes in size. Preallocation wastes memory. Alloc from any context addresses this issue for BPF and other subsystems that are forced to preallocate too. This long task started with introduction of alloc_pages_nolock(), then memcg and objcg were converted to operate from any context including NMI, this set completes the task with kmalloc_nolock() that builds on top of alloc_pages_nolock() and memcg changes. After that BPF subsystem will gradually adopt it everywhere. The patch set is on top of slab/for-next that already has pre-patch "locking/local_lock: Expose dep_map in local_trylock_t." applied. I think the patch set should be routed via vbabka/slab.git. v4->v5: - New patch "Reuse first bit for OBJEXTS_ALLOC_FAIL" to free up a bit and use it to mark slabobj_ext vector allocated with kmalloc_nolock(), so that freeing of the vector can be done with kfree_nolock() - Call kasan_slab_free() directly from kfree_nolock() instead of deferring to do_slab_free() to avoid double poisoning - Addressed other minor issues spotted by Harry v4: https://lore.kernel.org/all/20250718021646.73353-1-alexei.starovoitov@xxxxxxxxx/ v3->v4: - Converted local_lock_cpu_slab() to macro - Reordered patches 5 and 6 - Emphasized that kfree_nolock() shouldn't be used on kmalloc()-ed objects - Addressed other comments and improved commit logs - Fixed build issues reported by bots v3: https://lore.kernel.org/bpf/20250716022950.69330-1-alexei.starovoitov@xxxxxxxxx/ v2->v3: - Adopted Sebastian's local_lock_cpu_slab(), but dropped gfpflags to avoid extra branch for performance reasons, and added local_unlock_cpu_slab() for symmetry. - Dropped local_lock_lockdep_start/end() pair and switched to per kmem_cache lockdep class on PREEMPT_RT to silence false positive when the same cpu/task acquires two local_lock-s. - Refactorred defer_free per Sebastian's suggestion - Fixed slab leak when it needs to be deactivated via irq_work and llist as Vlastimil proposed. Including defer_free_barrier(). - Use kmem_cache->offset for llist_node pointer when linking objects instead of zero offset, since whole object could be used for slabs with ctors and other cases. - Fixed "cnt = 1; goto redo;" issue. - Fixed slab leak in alloc_single_from_new_slab(). - Retested with slab_debug, RT, !RT, lockdep, kasan, slab_tiny - Added acks to patches 1-4 that should be good to go. v2: https://lore.kernel.org/bpf/20250709015303.8107-1-alexei.starovoitov@xxxxxxxxx/ v1->v2: Added more comments for this non-trivial logic and addressed earlier comments. In particular: - Introduce alloc_frozen_pages_nolock() to avoid refcnt race - alloc_pages_nolock() defaults to GFP_COMP - Support SLUB_TINY - Added more variants to stress tester to discover that kfree_nolock() can OOM, because deferred per-slab llist won't be serviced if kfree_nolock() gets unlucky long enough. Scraped previous approach and switched to global per-cpu llist with immediate irq_work_queue() to process all object sizes. - Reentrant kmalloc cannot deactivate_slab(). In v1 the node hint was downgraded to NUMA_NO_NODE before calling slab_alloc(). Realized it's not good enough. There are odd cases that can trigger deactivate. Rewrote this part. - Struggled with SLAB_NO_CMPXCHG. Thankfully Harry had a great suggestion: https://lore.kernel.org/bpf/aFvfr1KiNrLofavW@hyeyoo/ which was adopted. So slab_debug works now. - In v1 I had to s/local_lock_irqsave/local_lock_irqsave_check/ in a bunch of places in mm/slub.c to avoid lockdep false positives. Came up with much cleaner approach to silence invalid lockdep reports without sacrificing lockdep coverage. See local_lock_lockdep_start/end(). v1: https://lore.kernel.org/bpf/20250501032718.65476-1-alexei.starovoitov@xxxxxxxxx/ Alexei Starovoitov (6): locking/local_lock: Introduce local_lock_is_locked(). mm: Allow GFP_ACCOUNT to be used in alloc_pages_nolock(). mm: Introduce alloc_frozen_pages_nolock() slab: Make slub local_(try)lock more precise for LOCKDEP slab: Reuse first bit for OBJEXTS_ALLOC_FAIL slab: Introduce kmalloc_nolock() and kfree_nolock(). include/linux/gfp.h | 2 +- include/linux/kasan.h | 13 +- include/linux/local_lock.h | 2 + include/linux/local_lock_internal.h | 7 + include/linux/memcontrol.h | 12 +- include/linux/rtmutex.h | 10 + include/linux/slab.h | 4 + kernel/bpf/stream.c | 2 +- kernel/bpf/syscall.c | 2 +- kernel/locking/rtmutex_common.h | 9 - mm/Kconfig | 1 + mm/internal.h | 4 + mm/kasan/common.c | 5 +- mm/page_alloc.c | 55 ++-- mm/slab.h | 7 + mm/slab_common.c | 3 + mm/slub.c | 495 +++++++++++++++++++++++++--- 17 files changed, 541 insertions(+), 92 deletions(-) -- 2.47.3