On Fri, 27 Jun 2025 at 20:36, Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> wrote: > > Good. Now you see my point, right? > The cond_resched() doesn't fix the issue. > 1hr to free a trie of 100M elements is horrible. > Try 100M kmalloc/kfree to see that slab is not the issue. > trie_free() algorithm is to blame. It doesn't need to start > from the root for every element. Fix the root cause. It doesn't take an hour to free 100M entries, the table showed it takes about a minute (67 or 62 seconds). I never claimed that kmalloc/kfree was at fault. I said that the loop in trie_free() has no preemption, and that's a problem with tries with millions of entries. Of course, rewriting the algorithm used in the lpm trie code would make this less of an issue. But this would require a major rework. It's not as simple as improving trie_free() alone. FWIW I tried using a recursive algorithm in trie_free() and the results are slightly better, but it still takes multiple seconds to free 10M entries (4.3s) and under a minute for 100M (56.7s). To fix this properly it's necessary to use more than two children per node to reduce the height of the trie. And in the meantime, anyone who uses maps with millions of entries is gonna have the kthread spin in a loop without preemption. Thanks, Matt