On Mon, Jun 30, 2025 at 6:28 AM Matt Fleming <mfleming@xxxxxxxxxxxxxx> wrote: > > On Fri, 27 Jun 2025 at 20:36, Alexei Starovoitov > <alexei.starovoitov@xxxxxxxxx> wrote: > > > > Good. Now you see my point, right? > > The cond_resched() doesn't fix the issue. > > 1hr to free a trie of 100M elements is horrible. > > Try 100M kmalloc/kfree to see that slab is not the issue. > > trie_free() algorithm is to blame. It doesn't need to start > > from the root for every element. Fix the root cause. > > It doesn't take an hour to free 100M entries, the table showed it > takes about a minute (67 or 62 seconds). yeah. I misread the numbers. > I never claimed that kmalloc/kfree was at fault. I said that the loop > in trie_free() has no preemption, and that's a problem with tries with > millions of entries. > > Of course, rewriting the algorithm used in the lpm trie code would > make this less of an issue. But this would require a major rework. > It's not as simple as improving trie_free() alone. FWIW I tried using > a recursive algorithm in trie_free() and the results are slightly > better, but it still takes multiple seconds to free 10M entries (4.3s) > and under a minute for 100M (56.7s). To fix this properly it's > necessary to use more than two children per node to reduce the height > of the trie. What is the height of 100m tree ? What kind of "recursive algo" you have in mind? Could you try to keep a stack of nodes visited and once leaf is freed pop a node and continue walking. Then total height won't be a factor. The stack would need to be kmalloc-ed, of course, but still should be faster than walking from the root. > And in the meantime, anyone who uses maps with millions > of entries is gonna have the kthread spin in a loop without > preemption. Yes, because judging by this thread I don't believe you'll come back and fix it properly. I'd rather have this acute pain bothering somebody to fix it for good instead of papering over.