On 9/9/25 03:00, Alexei Starovoitov wrote: > From: Alexei Starovoitov <ast@xxxxxxxxxx> > > Overview: > > This patch set introduces kmalloc_nolock() which is the next logical > step towards any context allocation necessary to remove bpf_mem_alloc > and get rid of preallocation requirement in BPF infrastructure. > In production BPF maps grew to gigabytes in size. Preallocation wastes > memory. Alloc from any context addresses this issue for BPF and > other subsystems that are forced to preallocate too. > This long task started with introduction of alloc_pages_nolock(), > then memcg and objcg were converted to operate from any context > including NMI, this set completes the task with kmalloc_nolock() > that builds on top of alloc_pages_nolock() and memcg changes. > After that BPF subsystem will gradually adopt it everywhere. > > The patch set is on top of slab/for-next that already has > pre-patch "locking/local_lock: Expose dep_map in local_trylock_t." applied. > I think the patch set should be routed via vbabka/slab.git. Thanks, added to slab/for-next. There were no conflicts with mm-unstable when tried locally. > v4->v5: > - New patch "Reuse first bit for OBJEXTS_ALLOC_FAIL" to free up a bit > and use it to mark slabobj_ext vector allocated with kmalloc_nolock(), > so that freeing of the vector can be done with kfree_nolock() > - Call kasan_slab_free() directly from kfree_nolock() instead of deferring to > do_slab_free() to avoid double poisoning > - Addressed other minor issues spotted by Harry > > v4: > https://lore.kernel.org/all/20250718021646.73353-1-alexei.starovoitov@xxxxxxxxx/ > > v3->v4: > - Converted local_lock_cpu_slab() to macro > - Reordered patches 5 and 6 > - Emphasized that kfree_nolock() shouldn't be used on kmalloc()-ed objects > - Addressed other comments and improved commit logs > - Fixed build issues reported by bots > > v3: > https://lore.kernel.org/bpf/20250716022950.69330-1-alexei.starovoitov@xxxxxxxxx/ > > v2->v3: > - Adopted Sebastian's local_lock_cpu_slab(), but dropped gfpflags > to avoid extra branch for performance reasons, > and added local_unlock_cpu_slab() for symmetry. > - Dropped local_lock_lockdep_start/end() pair and switched to > per kmem_cache lockdep class on PREEMPT_RT to silence false positive > when the same cpu/task acquires two local_lock-s. > - Refactorred defer_free per Sebastian's suggestion > - Fixed slab leak when it needs to be deactivated via irq_work and llist > as Vlastimil proposed. Including defer_free_barrier(). > - Use kmem_cache->offset for llist_node pointer when linking objects > instead of zero offset, since whole object could be used for slabs > with ctors and other cases. > - Fixed "cnt = 1; goto redo;" issue. > - Fixed slab leak in alloc_single_from_new_slab(). > - Retested with slab_debug, RT, !RT, lockdep, kasan, slab_tiny > - Added acks to patches 1-4 that should be good to go. > > v2: > https://lore.kernel.org/bpf/20250709015303.8107-1-alexei.starovoitov@xxxxxxxxx/ > > v1->v2: > Added more comments for this non-trivial logic and addressed earlier comments. > In particular: > - Introduce alloc_frozen_pages_nolock() to avoid refcnt race > - alloc_pages_nolock() defaults to GFP_COMP > - Support SLUB_TINY > - Added more variants to stress tester to discover that kfree_nolock() can > OOM, because deferred per-slab llist won't be serviced if kfree_nolock() > gets unlucky long enough. Scraped previous approach and switched to > global per-cpu llist with immediate irq_work_queue() to process all > object sizes. > - Reentrant kmalloc cannot deactivate_slab(). In v1 the node hint was > downgraded to NUMA_NO_NODE before calling slab_alloc(). Realized it's not > good enough. There are odd cases that can trigger deactivate. Rewrote > this part. > - Struggled with SLAB_NO_CMPXCHG. Thankfully Harry had a great suggestion: > https://lore.kernel.org/bpf/aFvfr1KiNrLofavW@hyeyoo/ > which was adopted. So slab_debug works now. > - In v1 I had to s/local_lock_irqsave/local_lock_irqsave_check/ in a bunch > of places in mm/slub.c to avoid lockdep false positives. > Came up with much cleaner approach to silence invalid lockdep reports > without sacrificing lockdep coverage. See local_lock_lockdep_start/end(). > > v1: > https://lore.kernel.org/bpf/20250501032718.65476-1-alexei.starovoitov@xxxxxxxxx/ > > Alexei Starovoitov (6): > locking/local_lock: Introduce local_lock_is_locked(). > mm: Allow GFP_ACCOUNT to be used in alloc_pages_nolock(). > mm: Introduce alloc_frozen_pages_nolock() > slab: Make slub local_(try)lock more precise for LOCKDEP > slab: Reuse first bit for OBJEXTS_ALLOC_FAIL > slab: Introduce kmalloc_nolock() and kfree_nolock(). > > include/linux/gfp.h | 2 +- > include/linux/kasan.h | 13 +- > include/linux/local_lock.h | 2 + > include/linux/local_lock_internal.h | 7 + > include/linux/memcontrol.h | 12 +- > include/linux/rtmutex.h | 10 + > include/linux/slab.h | 4 + > kernel/bpf/stream.c | 2 +- > kernel/bpf/syscall.c | 2 +- > kernel/locking/rtmutex_common.h | 9 - > mm/Kconfig | 1 + > mm/internal.h | 4 + > mm/kasan/common.c | 5 +- > mm/page_alloc.c | 55 ++-- > mm/slab.h | 7 + > mm/slab_common.c | 3 + > mm/slub.c | 495 +++++++++++++++++++++++++--- > 17 files changed, 541 insertions(+), 92 deletions(-) >