On Wed, Jun 18, 2025 at 02:08:39PM -0700, Phil Hord wrote: > From: Phil Hord <phil.hord@xxxxxxxxx> > > When pruning during `git fetch` we check each pruned ref against the > ref_store one at a time to decide whether to report it as dangling. > This causes every local ref to be scanned for each ref being pruned. > > If there are N refs in the repo and M refs being pruned, this code is > O(M*N). However, `git remote prune` uses a very similar function that > is only O(N*log(M)). > > Remove the wasteful ref scanning for each pruned ref and use the faster > version already available in refs_warn_dangling_symrefs. > > In a repo with 126,000 refs, where I was pruning 28,000 refs, this > code made about 3.6 billion calls to strcmp and consumed 410 seconds > of CPU. (Invariably in that time, my remote would timeout and the > fetch would fail anyway.) > > After this change, the same operation completes in under 4 seconds. Very nice. I left some thoughts on the ordering question elsewhere, but I'd be OK with this approach, too. > I considered further optimizing this function to be O(N), but this > requires ref_store iterators to be sorted, too. I found some suggestions > that this is always the case, but I'm not certain it is. > > The current speedup is enough for our needs at the moment. I think we do guarantee the output order, and for-each-ref at least takes advantage of this since 2e7c6d2f41 (ref-filter: format iteratively with lexicographic refname sorting, 2024-10-21). That said, I wouldn't be surprised if there are other n-log-n bits of the code, and we usually consider that "good enough". So stopping here is probably fine. > + struct string_list refnames = STRING_LIST_INIT_NODUP; > const char *dangling_msg = dry_run > ? _(" (%s will become dangling)") > : _(" (%s has become dangling)"); > > + for (ref = stale_refs; ref; ref = ref->next) > + string_list_append(&refnames, ref->name); I was going to suggest using strset over string_list, since I think we prefer that these days for a simple set-inclusion check. But... > + string_list_sort(&refnames); > + refs_warn_dangling_symrefs(get_main_ref_store(the_repository), > + stderr, dangling_msg, &refnames); ...we are ultimately relying on refs_warn_dangling_symrefs(), so we'd have to update its interface. And we also reuse the list (here, after your patch, but already in remote.c) to pass to refs_delete_refs(). So probably not worth it. -Peff