On 7/10/25 21:13, Alexei Starovoitov wrote: > On Thu, Jul 10, 2025 at 8:05 AM Vlastimil Babka <vbabka@xxxxxxx> wrote: >> >> On 7/10/25 12:21, Harry Yoo wrote: >> > On Thu, Jul 10, 2025 at 11:36:02AM +0200, Vlastimil Babka wrote: >> >> On 7/9/25 03:53, Alexei Starovoitov wrote: >> >> >> >> Hm but this is leaking the slab we allocated and have in the "slab" >> >> variable, we need to free it back in that case. > > ohh. sorry for the silly mistake. > Re-reading the diff again I realized that I made a similar mistake in > alloc_single_from_new_slab(). > It has this bit: > if (!alloc_debug_processing(...)) > return NULL; > > so I assumed that doing: > if (!spin_trylock_irqsave(&n->list_lock,..)) > return NULL; > > is ok too. Now I see that !alloc_debug is purposefully leaking memory. > > Should we add: > @@ -2841,6 +2841,7 @@ static void *alloc_single_from_new_slab(struct > kmem_cache *s, struct slab *slab, > * It's not really expected that this would fail on a > * freshly allocated slab, but a concurrent memory > * corruption in theory could cause that. > + * Leak newly allocated slab. > */ > return NULL; > > so the next person doesn't make the same mistake? Sure, thanks. >> So... since we succeeded taking it from the list and thus the spin_trylock, >> it means it's safe to spinlock n->list_lock again - we might be waiting on >> other cpu to unlock it but we know we didn't NMI on our own cpu having the >> lock, right? But we'd probably need to convince lockdep about this somehow, >> and also remember if we allocated a new slab or taken on from the partial >> list... or just deal with this unlikely situation in another irq work :/ > > irq_work might be the least mind bending. > Good point about partial vs new slab. > For partial we can indeed proceed with deactivate_slab() and if > I'm reading the code correctly, it won't have new.inuse == 0, Hm I think due to other cpus freeing objects concurrently it might end up with 0. > so it won't go to discard_slab() (which won't be safe in this path) > But teaching lockdep that below bit in deactivate_slab() is safe: > } else if (new.freelist) { > spin_lock_irqsave(&n->list_lock, flags); > add_partial(n, slab, tail); > is a challenge. > > Since defer_free_work is there, I'm leaning to reuse it for > deactive_slab too. It will process > static DEFINE_PER_CPU(struct llist_head, defer_free_objects); > and > static DEFINE_PER_CPU(struct llist_head, defer_deactivate_slabs); Should work. Also deactivate_slab() should be the correct operation for both a slab from partial list and a newly allocated one. But oops, where do we store all the parameters for deactivate_slab() We can probably reuse the union with "struct list_head slab_list" for queueing. kmem_cache pointer can be simply taken from struct slab, it's already tehre. But the separate flush_freelist pointer? Maybe take advantage of list_head being two pointers and struct llist_node just one pointer, so what we need will still fit? Otherwise we could do the first two phases of deactivate_slab() immediately and only defer the third phase where the freelists are already merged and there's no freelist pointer to handle anymore. But if it's not necessary, let's not complicate. Also should kmem_cache_destroy() path now get a barrier to flush all pending irq_work? Does it exist? > Shouldn't be too ugly. Better ideas?