On Mon, Sep 08, 2025 at 05:08:43PM -0700, Alexei Starovoitov wrote: > On Tue, Aug 12, 2025 at 10:08 AM Harry Yoo <harry.yoo@xxxxxxxxxx> wrote: > > Sorry for the delay. I addressed all other comments > and will respin soon. No worries! Welcome back. > Only below question remains.. > > > > > > > { > > > > > > @@ -3732,9 +3808,13 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node, > > > > > > if (unlikely(!node_match(slab, node))) { > > > > > > /* > > > > > > * same as above but node_match() being false already > > > > > > - * implies node != NUMA_NO_NODE > > > > > > + * implies node != NUMA_NO_NODE. > > > > > > + * Reentrant slub cannot take locks necessary to > > > > > > + * deactivate_slab, hence ignore node preference. > > > > > > > > > > Now that we have defer_deactivate_slab(), we need to either update the > > > > > code or comment? > > > > > > > > > > 1. Deactivate slabs when node / pfmemalloc mismatches > > > > > or 2. Update comments to explain why it's still undesirable > > > > > > > > Well, defer_deactivate_slab() is a heavy hammer. > > > > In !SLUB_TINY it pretty much never happens. > > > > > > > > This bit: > > > > > > > > retry_load_slab: > > > > > > > > local_lock_cpu_slab(s, flags); > > > > if (unlikely(c->slab)) { > > > > > > > > is very rare. I couldn't trigger it at all in my stress test. > > > > > > > > But in this hunk the node mismatch is not rare, so ignoring node preference > > > > for kmalloc_nolock() is a much better trade off. > > > > But users would have requested that specific node instead of > > NUMA_NO_NODE because (at least) they think it's worth it. > > (e.g., allocating kernel data structures tied to specified node) > > > > I don't understand why kmalloc()/kmem_cache_alloc() try harder > > (by deactivating cpu slab) to respect the node parameter, > > but kmalloc_nolock() does not. > > Because kmalloc_nolock() tries to be as least intrusive as possible > to kmalloc slabs that the rest of the kernel is using. > > There won't be kmem_cache_alloc _nolock() version, because > the algorithm retries from a different bucket when the primary one > is locked. So it's only kmalloc_nolock() flavor and it takes > from generic kmalloc slab buckets with or without memcg. > > My understanding that c->slab is effectively a cache and in the long > run all c->slab-s should be stable. You're right and that's what makes it inefficient when users call kmalloc_node() or kmem_cache_alloc_node() every time with different node id because c->slab will be deactivated too often. > A given cpu should be kmalloc-ing the memory suitable for this local cpu. > In that sense deactivate_slab is a heavy hammer. kmalloc_nolock() > is for users who cannot control their running context. imo such > users shouldn't affect the cache property of c->slab hence ignoring > node preference for !allow_spin is not great, but imo it's a better > trade off than defer_deactivate_slab. The assumption here is that calling kmalloc_node() with a specific node other than the local node is a pretty niche case. And thus kmalloc_nolock() does not want to affect existing kmalloc() users. But given that assumption and your reasoning, even normal kmalloc_node() (perhaps even kmem_cache_alloc_node()) users shouldn't fill c->slab with a slab from a remote node then? Since most of users should be allocating memory from the local node anyway. > defer_deactivate_slab() is there for a rare race in retry_load_slab. > It can be done for !node_match(c->slab, node) too, > but it feels like a worse-r evil. Especially since kmalloc_nolock() > doesn't support __GFP_THISNODE. ...maybe it is fair to ignore node preference for kmalloc_node() in a sense that it isn't great to trylock n->list_lock and fail, then allocate new slabs (even when there are partial slabs available for the node). -- Cheers, Harry / Hyeonggon