On Thu, Jul 10, 2025 at 8:05 AM Vlastimil Babka <vbabka@xxxxxxx> wrote: > > On 7/10/25 12:21, Harry Yoo wrote: > > On Thu, Jul 10, 2025 at 11:36:02AM +0200, Vlastimil Babka wrote: > >> On 7/9/25 03:53, Alexei Starovoitov wrote: > >> > >> Hm but this is leaking the slab we allocated and have in the "slab" > >> variable, we need to free it back in that case. ohh. sorry for the silly mistake. Re-reading the diff again I realized that I made a similar mistake in alloc_single_from_new_slab(). It has this bit: if (!alloc_debug_processing(...)) return NULL; so I assumed that doing: if (!spin_trylock_irqsave(&n->list_lock,..)) return NULL; is ok too. Now I see that !alloc_debug is purposefully leaking memory. Should we add: @@ -2841,6 +2841,7 @@ static void *alloc_single_from_new_slab(struct kmem_cache *s, struct slab *slab, * It's not really expected that this would fail on a * freshly allocated slab, but a concurrent memory * corruption in theory could cause that. + * Leak newly allocated slab. */ return NULL; so the next person doesn't make the same mistake? Also help me understand... slab->objects is never equal to 1, right? /proc/slabinfo agrees, but I cannot decipher it through slab init code. Logically it makes sense. If that's the case why alloc_single_from_new_slab() has this part: if (slab->inuse == slab->objects) add_full(s, n, slab); else add_partial(n, slab, DEACTIVATE_TO_HEAD); Shouldn't it call add_partial() only ? since slab->inuse == 1 and slab->objects != 1 > > > > But it might be a partial slab taken from the list? > > True. > > > Then we need to trylock n->list_lock and if that fails, oh... > > So... since we succeeded taking it from the list and thus the spin_trylock, > it means it's safe to spinlock n->list_lock again - we might be waiting on > other cpu to unlock it but we know we didn't NMI on our own cpu having the > lock, right? But we'd probably need to convince lockdep about this somehow, > and also remember if we allocated a new slab or taken on from the partial > list... or just deal with this unlikely situation in another irq work :/ irq_work might be the least mind bending. Good point about partial vs new slab. For partial we can indeed proceed with deactivate_slab() and if I'm reading the code correctly, it won't have new.inuse == 0, so it won't go to discard_slab() (which won't be safe in this path) But teaching lockdep that below bit in deactivate_slab() is safe: } else if (new.freelist) { spin_lock_irqsave(&n->list_lock, flags); add_partial(n, slab, tail); is a challenge. Since defer_free_work is there, I'm leaning to reuse it for deactive_slab too. It will process static DEFINE_PER_CPU(struct llist_head, defer_free_objects); and static DEFINE_PER_CPU(struct llist_head, defer_deactivate_slabs); Shouldn't be too ugly. Better ideas?