Re: [PATCH 1/3] commit: convert pop_most_recent_commit() to prio_queue

René Scharfe <l.s.r@xxxxxx> · Sat, 19 Jul 2025 13:15:28 +0200

On 7/19/25 8:55 AM, Jeff King wrote:
> On Wed, Jul 16, 2025 at 11:39:49AM +0200, René Scharfe wrote:
> 
>> On 7/16/25 7:05 AM, Jeff King wrote:
>>> On Tue, Jul 15, 2025 at 04:51:07PM +0200, René Scharfe wrote:
>>>
>>>> pop_most_recent_commit() calls commit_list_insert_by_date(), which and
>>>> is itself called in a loop, which can lead to quadratic complexity.
>>>> Replace the commit_list with a prio_queue to ensure logarithmic worst
>>>> case complexity and convert all three users.
>>>
>>> I guess I'm cc'd because of my frequent complains about the quadratic
>>> nature of our commit lists? :)
>>
>> And because you introduced prio_queue.
> 
> I think that was Junio, but I think I can be counted as a cheerleader
> for the topic. :)

Ah, sorry.  You did make it stable, though, which allows using it for
backward-compatible history traversal.
>>> I actually have a series turning rev_info.commits into a prio_queue
>>> which I need to polish up (mostly just writing commit messages; I've
>>> been running with it for almost 2 years without trouble). Ironically it
>>> does not touch this spot, as these commit lists are formed on their own.
>>
>> That is not a coincidence.  I had a look at that series and tried to
>> reach its goals while keeping rev_info.commits a commit_list.  Why?
>> Mostly being vaguely uncomfortable with prio_queue' memory overhead,
>> lack of type safety and dual use as a stack.  I still used it, but only
>> as local variable, not in the central struct rev_info.
> 
> Hmm, I would have thought prio_queue had less memory overhead. You're
> spending one pointer per entry in a packed array, versus list nodes. But
> it's true that it doesn't shrink as items are removed (though that is
> something we _could_ implement).

If we just count the net data then a commit_list item has two pointers
and a prio_queue_entry has a pointer and an ID for stability.  That's a
tie.  ALLOC_GROW overallocates by ca. 50%, so that's 25% more on
average for the prio_queue.  No idea what overhead malloc() needs per
allocation, but I guess it's enough to tilt the scale back against
commit_lists.

However, a prio_queue without a comparison function acts as a FIFO
stack, but needs double the amount of memory than a pointer array
without the stability ID would, for the same behavior.

I don't think lack of shrinking causes much of an issue.  I did stumble
over at least one place where using a prio_queue in FIFO mode made the
code slightly but measurably slower than using a commit_list, though,
which could be rectified by using a raw array of pointers.

René