Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer

Phillip Wood <phillip.wood123@xxxxxxxxx> · Tue, 8 Apr 2025 16:58:58 +0100

On 08/04/2025 15:27, Junio C Hamano wrote:
Martin von Zweigbergk <martinvonz@xxxxxxxxxx> writes:

A set of individual commits that share the same "change ID" is,
unlike reflog entries which is an ordered set of tip of topics, not
inherently ordered.  This is inevitable in the distributed world
where many people can simultaneously work on improving a single
"change" in many different ways, but making it difficult if not
impossible to see how things evolved, simply because you first need
to figure out the order of these commits that share the same "change
ID".  Some may be independently evolved from the same ancestor
iteration.  Some may be repeatedly worked on on a single strand of
pearls (much like how development recorded in reflog entries of a
single branch in a single user set-up goes).  I guess you would need
a way to record the predecessor vs successor relationship of various
commits that share the same "change ID", much like commits form DAG
to represent ancestor vs descendant relationship.

That is correct. The change ID should be sufficient for handling
simple distributed cases involving a single remote but it's not a full
replacement for something like Mercurial's Changeset Evolution [1].

Just a random thought.  We could very easily replace "change ID"
with a concept of predecessor-successor commits.

Just like we can represent parents-children NxM transitive relation
only with 0 or more "parent" commit object headers, we can record
zero or more "predecessor" trailer in the commit log.

  (1) a commit with no "predecessor" is like "root commit" in the
      commit history topology.  It is a brand new change that took
      inspiration from nobody else and that is not a polished form of
      any other existing commit.

  (2) a commit created as a refinement for one or more existing
      commits record each of them as "predecessor" to it.  Having
      more than one of them is like a "merge commit" in the commit
      history topology and represents that two patches were squashed
      into one.

  (3) Splitting an originally large change into multiple changes can
      be represented the same way.  They share the same commit as
      their "predecessor".  Perhaps you have originally two-commit
      series, A and B, and split them differently in such a way that
      C has half of a and D has the rest of A plus B.  In which case,
      C has A as its predecessor while D has both A and B as its
      predecessor.

  (4) Just like we can use auxiliary data structures like bitmaps to
      figure out reachability without following all the links in the
      commit history topology, we should be able to learn how a new
      change was born, and trace how it evolved into newer iteration
      of the moral equivalent of the change, possibly as a series
      with mutiple commits, using auxiliary data structure, which
      would represent predecessor-successor NxM transitive relation
      in a similar way in a form that is efficient to access.

Something like this should allow us avoid relying on "change ID"s
that can collide elsewhere in the world without having a central
authority to assign them.

This is similar in spirit to the "git evolve" proposal [1]. One of the 
objections to that was that it required all of the rewritten commits to 
be pushed back to the remote, rather than just the current version. So 
if I rewrite a branch three times and push the result for review all of 
the intermediate state gets pushed as well. That is because the 
intermediate commits were needed to track the chain of rewritten commits 
 to avoid the problem Elijah described [2] when trying to follow 
cherry-picked-from trailers. If the predecessor information was stored 
separately to the commit it refers to (in a notes ref for example) then 
we could in principle simplify the chain of rewrites when pushing so 
that we only need to push the final version of the commit and a mapping 
from the version that we fetched from the remote.

Tracking predecessors as you describe is certainly a more complete 
solution to tracking the evolution of commits and it addresses the 
shortcomings of change-ids you outlined in your previous mail. It is a 
lot more work to implement though.

Best Wishes

Phillip

[1] 
https://lore.kernel.org/git/pull.1356.git.1663959324.gitgitgadget@xxxxxxxxx/
[2] 
https://lore.kernel.org/git/CABPp-BECTrVp9X6bVmzU8LEeYsC3KbzeJvAaDPN+FgZz_uEhmA@xxxxxxxxxxxxxx/