On Sun, Aug 24, 2025 at 10:37:28AM +0200, René Scharfe wrote: > We could dedicate an object flag to track queue membership, but that > would leave less for candidate tags, affecting the results. So use a > hash table, specifically a khash set of commit pointers, to track that. > This avoids quadratic behaviour in all cases and provides a nice > performance boost over the previous commit, 08bb69d70f (describe: use > prio_queue_replace(), 2025-08-03): > > Benchmark 1: ./git_08bb69d70f describe $(git rev-list v2.41.0..v2.47.0) > Time (mean ± σ): 851.7 ms ± 1.1 ms [User: 788.7 ms, System: 49.2 ms] > Range (min … max): 849.4 ms … 852.8 ms 10 runs > > Benchmark 2: ./git describe $(git rev-list v2.41.0..v2.47.0) > Time (mean ± σ): 607.1 ms ± 0.9 ms [User: 544.6 ms, System: 48.6 ms] > Range (min … max): 606.1 ms … 608.3 ms 10 runs > > Summary > ./git describe $(git rev-list v2.41.0..v2.47.0) ran > 1.40 ± 0.00 times faster than ./git_08bb69d70f describe $(git rev-list v2.41.0..v2.47.0) > > Use the commit index value as a hash because it is unique, has the > right size and needs no computation. We could also derive the hash > directly from the pointer value, but that requires slightly more effort. Interesting. This is exactly what commit-slabs were created for (and are why the convenient index value is there in the first place!). The idea of commit-slab was to have a zero-cost lookup, which is done by indexing into an array (well, really an array-of-arrays). The biggest downside of commit-slabs is that they allocate one element per commit. So for a sparse set you end up over-allocating and possibly suffering cache woes due to requests being far apart. Whereas in your technique we are trading a little bit of computation (indexing a bucket and then probing for the match) to get a table that scales with the number of elements actually added to it. It should be easy to convert between the two and time it. On top of your patch, I think this works: diff --git a/builtin/describe.c b/builtin/describe.c index edb4dec79d..f1d1ce8c8e 100644 --- a/builtin/describe.c +++ b/builtin/describe.c @@ -287,36 +287,38 @@ static void lazy_queue_clear(struct lazy_queue *queue) queue->get_pending = false; } -static inline unsigned int commit_index(const struct commit *commit) -{ - return commit->index; -} - -static inline int ptr_eq(const void *a, const void *b) -{ - return a == b; -} +define_commit_slab(commit_counter, int); -KHASH_INIT(commit_set, struct commit *, int, 0, commit_index, ptr_eq) +struct commit_set { + int nr; + struct commit_counter present; +}; -static void commit_set_insert(kh_commit_set_t *set, struct commit *commit) +static void commit_set_insert(struct commit_set *set, struct commit *commit) { - int added; - kh_put_commit_set(set, commit, &added); + int *v = commit_counter_at(&set->present, commit); + if (!*v) { + set->nr++; + *v = 1; + } } -static void commit_set_remove(kh_commit_set_t *set, struct commit *commit) +static void commit_set_remove(struct commit_set *set, struct commit *commit) { - khiter_t pos = kh_get_commit_set(set, commit); - if (pos != kh_end(set)) - kh_del_commit_set(set, pos); + int *v = commit_counter_peek(&set->present, commit); + if (*v) { + set->nr--; + *v = 0; + } } static unsigned long finish_depth_computation(struct lazy_queue *queue, struct possible_tag *best) { unsigned long seen_commits = 0; - kh_commit_set_t unflagged = { 0 }; + struct commit_set unflagged = { 0 }; + + init_commit_counter(&unflagged.present); for (size_t i = queue->get_pending ? 1 : 0; i < queue->queue.nr; i++) { struct commit *commit = queue->queue.array[i].data; @@ -330,7 +332,7 @@ static unsigned long finish_depth_computation(struct lazy_queue *queue, seen_commits++; if (c->object.flags & best->flag_within) { - if (!kh_size(&unflagged)) + if (!unflagged.nr) break; } else { commit_set_remove(&unflagged, c); @@ -354,7 +356,7 @@ static unsigned long finish_depth_computation(struct lazy_queue *queue, parents = parents->next; } } - kh_release_commit_set(&unflagged); + clear_commit_counter(&unflagged.present); return seen_commits; } Here's what I get for timing: Benchmark 1: ./git.orig describe $(git rev-list v2.41.0..v2.47.0) Time (mean ± σ): 1.195 s ± 0.012 s [User: 1.152 s, System: 0.045 s] Range (min … max): 1.175 s … 1.220 s 10 runs Benchmark 2: ./git.khash describe $(git rev-list v2.41.0..v2.47.0) Time (mean ± σ): 912.4 ms ± 5.7 ms [User: 867.7 ms, System: 46.3 ms] Range (min … max): 901.1 ms … 921.2 ms 10 runs Benchmark 3: ./git.slab describe $(git rev-list v2.41.0..v2.47.0) Time (mean ± σ): 937.9 ms ± 7.6 ms [User: 896.1 ms, System: 43.5 ms] Range (min … max): 924.8 ms … 947.9 ms 10 runs Summary ./git.khash describe $(git rev-list v2.41.0..v2.47.0) ran 1.03 ± 0.01 times faster than ./git.slab describe $(git rev-list v2.41.0..v2.47.0) 1.31 ± 0.02 times faster than ./git.orig describe $(git rev-list v2.41.0..v2.47.0) So I see similar speedups vs stock Git using your patch, but the commit-slab version is just slightly slower. That of course makes me wonder if we could or should replace the guts of commit-slab with a hash more like this. Some obvious questions: 1. Does the hash always perform better? For a dense set, might the commit-slab do better (probably something like topo-sort would be a good test there). 2. Can the hash version handle strides of different sizes? One of the points of commit-slab is that the fixed size of the value type can be set at runtime (so you could have a slab of 32 bits per commit, or 132, depending on your traversal needs). 3. How does it perform if we swap the commit->index field for using the pointer? If it's similar or faster, we could get rid of the commit->index field entirely. Besides saving a few bytes and being simpler, that would also mean that we could start to use the same slab techniques for non-commit objects. There are several cases where we use a few custom bits in object.flags because we need to cover both commits and other objects. But those are error prone if two sub-systems of Git use the same bits and happen to run at the same time without clearing in between. It would be great if each algorithm could declare its own unique flag space (and discard them in one action without iterating yet again to clear the bits). -Peff