Re: [PATCH v4 6/6] slab: Introduce kmalloc_nolock() and kfree_nolock().

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Sep 08, 2025 at 05:08:43PM -0700, Alexei Starovoitov wrote:
> On Tue, Aug 12, 2025 at 10:08 AM Harry Yoo <harry.yoo@xxxxxxxxxx> wrote:
> 
> Sorry for the delay. I addressed all other comments
> and will respin soon.

No worries! Welcome back.

> Only below question remains..
> 
> > > > > >  {
> > > > > > @@ -3732,9 +3808,13 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
> > > > > >       if (unlikely(!node_match(slab, node))) {
> > > > > >               /*
> > > > > >                * same as above but node_match() being false already
> > > > > > -              * implies node != NUMA_NO_NODE
> > > > > > +              * implies node != NUMA_NO_NODE.
> > > > > > +              * Reentrant slub cannot take locks necessary to
> > > > > > +              * deactivate_slab, hence ignore node preference.
> > > > >
> > > > > Now that we have defer_deactivate_slab(), we need to either update the
> > > > > code or comment?
> > > > >
> > > > > 1. Deactivate slabs when node / pfmemalloc mismatches
> > > > > or 2. Update comments to explain why it's still undesirable
> > > >
> > > > Well, defer_deactivate_slab() is a heavy hammer.
> > > > In !SLUB_TINY it pretty much never happens.
> > > >
> > > > This bit:
> > > >
> > > > retry_load_slab:
> > > >
> > > >         local_lock_cpu_slab(s, flags);
> > > >         if (unlikely(c->slab)) {
> > > >
> > > > is very rare. I couldn't trigger it at all in my stress test.
> > > >
> > > > But in this hunk the node mismatch is not rare, so ignoring node preference
> > > > for kmalloc_nolock() is a much better trade off.
> >
> > But users would have requested that specific node instead of
> > NUMA_NO_NODE because (at least) they think it's worth it.
> > (e.g., allocating kernel data structures tied to specified node)
> >
> > I don't understand why kmalloc()/kmem_cache_alloc() try harder
> > (by deactivating cpu slab) to respect the node parameter,
> > but kmalloc_nolock() does not.
> 
> Because kmalloc_nolock() tries to be as least intrusive as possible
> to kmalloc slabs that the rest of the kernel is using.
>
> There won't be kmem_cache_alloc _nolock() version, because
> the algorithm retries from a different bucket when the primary one
> is locked. So it's only kmalloc_nolock() flavor and it takes
> from generic kmalloc slab buckets with or without memcg.
>
> My understanding that c->slab is effectively a cache and in the long
> run all c->slab-s should be stable.

You're right and that's what makes it inefficient when users call
kmalloc_node() or kmem_cache_alloc_node() every time with different
node id because c->slab will be deactivated too often.

> A given cpu should be kmalloc-ing the memory suitable for this local cpu.
> In that sense deactivate_slab is a heavy hammer. kmalloc_nolock()
> is for users who cannot control their running context. imo such
> users shouldn't affect the cache property of c->slab hence ignoring
> node preference for !allow_spin is not great, but imo it's a better
> trade off than defer_deactivate_slab.

The assumption here is that calling kmalloc_node() with a specific
node other than the local node is a pretty niche case. And thus
kmalloc_nolock() does not want to affect existing kmalloc() users.

But given that assumption and your reasoning, even normal kmalloc_node()
(perhaps even kmem_cache_alloc_node()) users shouldn't fill c->slab with
a slab from a remote node then? Since most of users should be allocating
memory from the local node anyway.

> defer_deactivate_slab() is there for a rare race in retry_load_slab.
> It can be done for !node_match(c->slab, node) too,
> but it feels like a worse-r evil. Especially since kmalloc_nolock()
> doesn't support __GFP_THISNODE.

...maybe it is fair to ignore node preference for kmalloc_node()
in a sense that it isn't great to trylock n->list_lock and fail, then
allocate new slabs (even when there are partial slabs available for the
node).

-- 
Cheers,
Harry / Hyeonggon




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux