Re: [PATCH 1/3] commit: convert pop_most_recent_commit() to prio_queue

Jeff King <peff@xxxxxxxx> · Wed, 16 Jul 2025 01:05:40 -0400

On Tue, Jul 15, 2025 at 04:51:07PM +0200, René Scharfe wrote:

> pop_most_recent_commit() calls commit_list_insert_by_date(), which and
> is itself called in a loop, which can lead to quadratic complexity.
> Replace the commit_list with a prio_queue to ensure logarithmic worst
> case complexity and convert all three users.

I guess I'm cc'd because of my frequent complains about the quadratic
nature of our commit lists? :)

Mostly I asked because I had to look at pop_most_recent_commit() to see
what operation would be made faster here. Looks like it's mostly ":/",
but maybe also fetch's mark_recent_complete_commits()? I guess we might
hit that if you have a huge number of refs?

Anyway, I am in support of the direction regardless.

I actually have a series turning rev_info.commits into a prio_queue
which I need to polish up (mostly just writing commit messages; I've
been running with it for almost 2 years without trouble). Ironically it
does not touch this spot, as these commit lists are formed on their own.

The patch itself looks reasonable. I think here:

> @@ -1461,7 +1462,7 @@ static int get_oid_oneline(struct repository *r,
>  			   const char *prefix, struct object_id *oid,
>  			   const struct commit_list *list)
>  {
> -	struct commit_list *copy = NULL, **copy_tail = &copy;
> +	struct prio_queue copy = { compare_commits_by_commit_date };
>  	const struct commit_list *l;
>  	int found = 0;
>  	int negative = 0;
> @@ -1483,9 +1484,9 @@ static int get_oid_oneline(struct repository *r,
>  
>  	for (l = list; l; l = l->next) {
>  		l->item->object.flags |= ONELINE_SEEN;
> -		copy_tail = &commit_list_insert(l->item, copy_tail)->next;
> +		prio_queue_put(&copy, l->item);
>  	}
> -	while (copy) {
> +	while (copy.nr) {
>  		const char *p, *buf;
>  		struct commit *commit;
>  		int matches;

our callers are always generating and passing in a list. So we could
avoid the overhead of allocating the list in the first place by just
taking a prio_queue. But maybe it gets weird with clearing the
ONELINE_SEEN marks? We make a copy even in the current code so that we
can call clear_commit_marks() on the complete set.

I guess we could add them to an array or something, but that probably
ends up being roughly the same amount of work.

> +build_history () {
> +	local max_level="$1" &&
> +	local level="${2:-1}" &&
> +	local mark="${3:-1}" &&
> +	if test $level -eq $max_level
> +	then
> +		echo "reset refs/heads/master" &&
> +		echo "from $ZERO_OID" &&
> +		echo "commit refs/heads/master" &&
> +		echo "mark :$mark" &&
> +		echo "committer C <c@xxxxxxxxxxx> 1234567890 +0000" &&
> +		echo "data <<EOF" &&
> +		echo "$mark" &&
> +		echo "EOF"
> +	else
> +		local level1=$((level+1)) &&
> +		local mark1=$((2*mark)) &&
> +		local mark2=$((2*mark+1)) &&
> +		build_history $max_level $level1 $mark1 &&
> +		build_history $max_level $level1 $mark2 &&
> +		echo "commit refs/heads/master" &&
> +		echo "mark :$mark" &&
> +		echo "committer C <c@xxxxxxxxxxx> 1234567890 +0000" &&
> +		echo "data <<EOF" &&
> +		echo "$mark" &&
> +		echo "EOF" &&
> +		echo "from :$mark1" &&
> +		echo "merge :$mark2"
> +	fi
> +}

This took some brain cycles to decipher. It looks like we'll make
2^$level commits in a filled tree? It might be worth a brief comment
describing the goal (and maybe even giving an example graph drawing for
N=3 or something, though it gets out of hand quickly).

> +test_perf "rev-parse ':/$(cat needle)'" "
> +	git rev-parse ':/$(cat needle)' >actual
> +"

Hmm, usually we frown on putting snippets inside double-quotes because
it's so easy to accidentally interpolate outside of the test_eval. But
maybe this is short enough to be OK. I guess you did it here especially
so that the title is a nice ":/65535" and not the opaque "$(cat
needle)".

-Peff