Re: [PATCH v4 6/6] slab: Introduce kmalloc_nolock() and kfree_nolock().

Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> · Mon, 8 Sep 2025 17:08:43 -0700

On Tue, Aug 12, 2025 at 10:08 AM Harry Yoo <harry.yoo@xxxxxxxxxx> wrote:
>

Sorry for the delay. I addressed all other comments
and will respin soon.
Only below question remains..

> > > > >  {
> > > > > @@ -3732,9 +3808,13 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
> > > > >       if (unlikely(!node_match(slab, node))) {
> > > > >               /*
> > > > >                * same as above but node_match() being false already
> > > > > -              * implies node != NUMA_NO_NODE
> > > > > +              * implies node != NUMA_NO_NODE.
> > > > > +              * Reentrant slub cannot take locks necessary to
> > > > > +              * deactivate_slab, hence ignore node preference.
> > > >
> > > > Now that we have defer_deactivate_slab(), we need to either update the
> > > > code or comment?
> > > >
> > > > 1. Deactivate slabs when node / pfmemalloc mismatches
> > > > or 2. Update comments to explain why it's still undesirable
> > >
> > > Well, defer_deactivate_slab() is a heavy hammer.
> > > In !SLUB_TINY it pretty much never happens.
> > >
> > > This bit:
> > >
> > > retry_load_slab:
> > >
> > >         local_lock_cpu_slab(s, flags);
> > >         if (unlikely(c->slab)) {
> > >
> > > is very rare. I couldn't trigger it at all in my stress test.
> > >
> > > But in this hunk the node mismatch is not rare, so ignoring node preference
> > > for kmalloc_nolock() is a much better trade off.
>
> But users would have requested that specific node instead of
> NUMA_NO_NODE because (at least) they think it's worth it.
> (e.g., allocating kernel data structures tied to specified node)
>
> I don't understand why kmalloc()/kmem_cache_alloc() try harder
> (by deactivating cpu slab) to respect the node parameter,
> but kmalloc_nolock() does not.

Because kmalloc_nolock() tries to be as least intrusive as possible
to kmalloc slabs that the rest of the kernel is using.
There won't be kmem_cache_alloc _nolock() version, because
the algorithm retries from a different bucket when the primary one
is locked. So it's only kmalloc_nolock() flavor and it takes
from generic kmalloc slab buckets with or without memcg.
My understanding that c->slab is effectively a cache and in the long
run all c->slab-s should be stable. A given cpu should be kmalloc-ing
the memory suitable for this local cpu.
In that sense deactivate_slab is a heavy hammer. kmalloc_nolock()
is for users who cannot control their running context. imo such
users shouldn't affect the cache property of c->slab hence ignoring
node preference for !allow_spin is not great, but imo it's a better
trade off than defer_deactivate_slab.
defer_deactivate_slab() is there for a rare race in retry_load_slab.
It can be done for !node_match(c->slab, node) too,
but it feels like a worse-r evil. Especially since kmalloc_nolock()
doesn't support __GFP_THISNODE.