Re: [PATCH v2 1/3] commit: convert pop_most_recent_commit() to prio_queue

René Scharfe <l.s.r@xxxxxx> · Sun, 3 Aug 2025 11:54:26 +0200

On 7/21/25 4:02 PM, Lidong Yan wrote:
> René Scharfe <l.s.r@xxxxxx> writes:
>>
>> +#
>> +# Creates lots of merges to make history traversal costly.  In
>> +# particular it creates 2^($max_level-1)-1 2-way merges on top of
>> +# 2^($max_level-1) root commits.  E.g., the commit history looks like
>> +# this for a $max_level of 3:
>> +#
>> +#     _1_
>> +#    /   \
>> +#   2     3
>> +#  / \   / \
>> +# 4   5 6   7
>> +#
>> +# The numbers are the fast-import marks, which also are the commit
>> +# messages.  1 is the HEAD commit and a merge, 2 and 3 are also merges,
>> +# 4-7 are the root commits.
>> +#
> 
> I feel that the reason there's no significant performance improvement is probably
> because mostly we are using the priority queue to sort O(siblings) nodes.
> For example, in this case, the most time-consuming operation is when the priority
> queue or commit list contains 4 and 5, and we then need to insert 6 and 7.
> 
> Assuming the maximum number of siblings is W and the number of nodes is N,
> the time complexity with a commit list is O(W² × N), while using a priority queue
> gives O(W log W × N). Perhaps in many projects, W isn't particularly large,
> which results in the performance improvement not being very significant.

Kinda.  While traversing the history we take a commit from to the
commit_list or prio_queue and put back its parents.  For single-parent
commits this sequence keeps the number of stored items the same.  Merges
increase that number.

We add and retrieve each commit in the (relevant part of) history.  That
takes O(N) and O(1) for the sorted list, and O(log N) and O(log N) for
the prio_queue, where N is the length of the list.

So the best-case history is a string of single-parent commits, keeping
only a single item on the list/queue throughout.  That requires no
sorting or heaping, making the additions and retrievals O(1).  The
overall complexity is then O(N) for both variants, N being the number
of commits in the history.

Worst-case history might be a single merge of all commits -- a
centipede or myriapod?  With all commits on the sorted list we get a
complexity of O(N²) for the traversal, and O(N log N) with a prio_queue.

René