Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer

Jacob Keller <jacob.e.keller@xxxxxxxxx> · Tue, 15 Apr 2025 17:24:58 -0700

On 4/8/2025 7:27 AM, Junio C Hamano wrote:
> Martin von Zweigbergk <martinvonz@xxxxxxxxxx> writes:
> 
>>> A set of individual commits that share the same "change ID" is,
>>> unlike reflog entries which is an ordered set of tip of topics, not
>>> inherently ordered.  This is inevitable in the distributed world
>>> where many people can simultaneously work on improving a single
>>> "change" in many different ways, but making it difficult if not
>>> impossible to see how things evolved, simply because you first need
>>> to figure out the order of these commits that share the same "change
>>> ID".  Some may be independently evolved from the same ancestor
>>> iteration.  Some may be repeatedly worked on on a single strand of
>>> pearls (much like how development recorded in reflog entries of a
>>> single branch in a single user set-up goes).  I guess you would need
>>> a way to record the predecessor vs successor relationship of various
>>> commits that share the same "change ID", much like commits form DAG
>>> to represent ancestor vs descendant relationship.
>>
>> That is correct. The change ID should be sufficient for handling
>> simple distributed cases involving a single remote but it's not a full
>> replacement for something like Mercurial's Changeset Evolution [1].
> 
> Just a random thought.  We could very easily replace "change ID"
> with a concept of predecessor-successor commits.
> 
> Just like we can represent parents-children NxM transitive relation
> only with 0 or more "parent" commit object headers, we can record
> zero or more "predecessor" trailer in the commit log.
> 
>  (1) a commit with no "predecessor" is like "root commit" in the
>      commit history topology.  It is a brand new change that took
>      inspiration from nobody else and that is not a polished form of
>      any other existing commit.
> 
>  (2) a commit created as a refinement for one or more existing
>      commits record each of them as "predecessor" to it.  Having
>      more than one of them is like a "merge commit" in the commit
>      history topology and represents that two patches were squashed
>      into one.
> 
>  (3) Splitting an originally large change into multiple changes can
>      be represented the same way.  They share the same commit as
>      their "predecessor".  Perhaps you have originally two-commit
>      series, A and B, and split them differently in such a way that
>      C has half of a and D has the rest of A plus B.  In which case,
>      C has A as its predecessor while D has both A and B as its
>      predecessor.
> 
>  (4) Just like we can use auxiliary data structures like bitmaps to
>      figure out reachability without following all the links in the
>      commit history topology, we should be able to learn how a new
>      change was born, and trace how it evolved into newer iteration
>      of the moral equivalent of the change, possibly as a series
>      with mutiple commits, using auxiliary data structure, which
>      would represent predecessor-successor NxM transitive relation
>      in a similar way in a form that is efficient to access.
> 
> Something like this should allow us avoid relying on "change ID"s
> that can collide elsewhere in the world without having a central
> authority to assign them.
> 

This does seem like the most "powerful" form of this, but does lose one
of the "simplicity"-based advantages of change ids.

Of course, you could simply use the root commit ID in most cases and
that would be sufficient, and in cases where its not unique you could
have tooling show more data and allow users to disambiguate.

This approach also likely requires the most "work" to implement on the
git side, vs storing a simpler single-value header.