On Tue, Jul 1, 2025 at 6:25 PM Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> wrote: > > On Mon, Jun 30, 2025 at 6:28 AM Matt Fleming <mfleming@xxxxxxxxxxxxxx> wrote: > > > > On Fri, 27 Jun 2025 at 20:36, Alexei Starovoitov > > <alexei.starovoitov@xxxxxxxxx> wrote: > > > > > > Good. Now you see my point, right? > > > The cond_resched() doesn't fix the issue. > > > 1hr to free a trie of 100M elements is horrible. > > > Try 100M kmalloc/kfree to see that slab is not the issue. > > > trie_free() algorithm is to blame. It doesn't need to start > > > from the root for every element. Fix the root cause. > > > > It doesn't take an hour to free 100M entries, the table showed it > > takes about a minute (67 or 62 seconds). > > yeah. I misread the numbers. > > > I never claimed that kmalloc/kfree was at fault. I said that the loop > > in trie_free() has no preemption, and that's a problem with tries with > > millions of entries. > > > > Of course, rewriting the algorithm used in the lpm trie code would > > make this less of an issue. But this would require a major rework. > > It's not as simple as improving trie_free() alone. FWIW I tried using > > a recursive algorithm in trie_free() and the results are slightly > > better, but it still takes multiple seconds to free 10M entries (4.3s) > > and under a minute for 100M (56.7s). To fix this properly it's > > necessary to use more than two children per node to reduce the height > > of the trie. > > What is the height of 100m tree ? > > What kind of "recursive algo" you have in mind? > Could you try to keep a stack of nodes visited and once leaf is > freed pop a node and continue walking. > Then total height won't be a factor. > The stack would need to be kmalloc-ed, of course, > but still should be faster than walking from the root. > > > And in the meantime, anyone who uses maps with millions > > of entries is gonna have the kthread spin in a loop without > > preemption. > > Yes, because judging by this thread I don't believe you'll come > back and fix it properly. > I'd rather have this acute pain bothering somebody to fix it > for good instead of papering over. I think we need both anyway just for the reason we need something to backport to stable. A full re-implementation of trie might be viewed as a new feature, but older kernels need to be "fixed" as well.