On Thu, Jul 10, 2025 at 12:13:20PM -0700, Alexei Starovoitov wrote: > On Thu, Jul 10, 2025 at 8:05 AM Vlastimil Babka <vbabka@xxxxxxx> wrote: > > > > On 7/10/25 12:21, Harry Yoo wrote: > > > On Thu, Jul 10, 2025 at 11:36:02AM +0200, Vlastimil Babka wrote: > > >> On 7/9/25 03:53, Alexei Starovoitov wrote: > > >> > > >> Hm but this is leaking the slab we allocated and have in the "slab" > > >> variable, we need to free it back in that case. > > ohh. sorry for the silly mistake. > Re-reading the diff again I realized that I made a similar mistake in > alloc_single_from_new_slab(). > It has this bit: > if (!alloc_debug_processing(...)) > return NULL; Yeah but we purposefully leak slabs if !alloc_debug_processing(), the same in alloc_single_from_partial(). > so I assumed that doing: > if (!spin_trylock_irqsave(&n->list_lock,..)) > return NULL; > > is ok too. Now I see that !alloc_debug is purposefully leaking memory. > > Should we add: > @@ -2841,6 +2841,7 @@ static void *alloc_single_from_new_slab(struct > kmem_cache *s, struct slab *slab, > * It's not really expected that this would fail on a > * freshly allocated slab, but a concurrent memory > * corruption in theory could cause that. > + * Leak newly allocated slab. > */ > return NULL; > > so the next person doesn't make the same mistake? Looks fine. Probably add a comment to alloc_single_from_partial() as well? > Also help me understand... > slab->objects is never equal to 1, right? No. For example, if you allocate a 4k obj and SLUB allocate a slab with oo_order(s->min), s->objects will be 1. /proc/slabinfo only prints oo_order(s->oo), not oo_order(s->min). > /proc/slabinfo agrees, but I cannot decipher it through slab init code. > Logically it makes sense. I think the reason why there is no <objsperslab> == 1 in your /proc/slabinfo is that calculate_order() tries to choose higher order for slabs (based on nr of CPUs) to reduce lock contention. But nothing prevents s->objects from being 1. > If that's the case why alloc_single_from_new_slab() > has this part: > if (slab->inuse == slab->objects) > add_full(s, n, slab); > else > add_partial(n, slab, DEACTIVATE_TO_HEAD); > > Shouldn't it call add_partial() only ? > since slab->inuse == 1 and slab->objects != 1 ...and that means we need to handle slab->inuse == slab->objects. > > > But it might be a partial slab taken from the list? > > > > True. > > > > > Then we need to trylock n->list_lock and if that fails, oh... > > > > So... since we succeeded taking it from the list and thus the spin_trylock, > > it means it's safe to spinlock n->list_lock again - we might be waiting on > > other cpu to unlock it but we know we didn't NMI on our own cpu having the > > lock, right? But we'd probably need to convince lockdep about this somehow, > > and also remember if we allocated a new slab or taken on from the partial > > list... or just deal with this unlikely situation in another irq work :/ > > irq_work might be the least mind bending. > Good point about partial vs new slab. > For partial we can indeed proceed with deactivate_slab() and if > I'm reading the code correctly, it won't have new.inuse == 0, > so it won't go to discard_slab() (which won't be safe in this path) > But teaching lockdep that below bit in deactivate_slab() is safe: > } else if (new.freelist) { > spin_lock_irqsave(&n->list_lock, flags); > add_partial(n, slab, tail); > is a challenge. > > Since defer_free_work is there, I'm leaning to reuse it for > deactive_slab too. It will process > static DEFINE_PER_CPU(struct llist_head, defer_free_objects); > and > static DEFINE_PER_CPU(struct llist_head, defer_deactivate_slabs); +1 for another irq work for slab deactivation it should be rare anyway... > Shouldn't be too ugly. Better ideas? -- Cheers, Harry / Hyeonggon