On 08/04/2025 15:27, Junio C Hamano wrote:
Martin von Zweigbergk <martinvonz@xxxxxxxxxx> writes:
A set of individual commits that share the same "change ID" is,
unlike reflog entries which is an ordered set of tip of topics, not
inherently ordered. This is inevitable in the distributed world
where many people can simultaneously work on improving a single
"change" in many different ways, but making it difficult if not
impossible to see how things evolved, simply because you first need
to figure out the order of these commits that share the same "change
ID". Some may be independently evolved from the same ancestor
iteration. Some may be repeatedly worked on on a single strand of
pearls (much like how development recorded in reflog entries of a
single branch in a single user set-up goes). I guess you would need
a way to record the predecessor vs successor relationship of various
commits that share the same "change ID", much like commits form DAG
to represent ancestor vs descendant relationship.
That is correct. The change ID should be sufficient for handling
simple distributed cases involving a single remote but it's not a full
replacement for something like Mercurial's Changeset Evolution [1].
Just a random thought. We could very easily replace "change ID"
with a concept of predecessor-successor commits.
Just like we can represent parents-children NxM transitive relation
only with 0 or more "parent" commit object headers, we can record
zero or more "predecessor" trailer in the commit log.
(1) a commit with no "predecessor" is like "root commit" in the
commit history topology. It is a brand new change that took
inspiration from nobody else and that is not a polished form of
any other existing commit.
(2) a commit created as a refinement for one or more existing
commits record each of them as "predecessor" to it. Having
more than one of them is like a "merge commit" in the commit
history topology and represents that two patches were squashed
into one.
(3) Splitting an originally large change into multiple changes can
be represented the same way. They share the same commit as
their "predecessor". Perhaps you have originally two-commit
series, A and B, and split them differently in such a way that
C has half of a and D has the rest of A plus B. In which case,
C has A as its predecessor while D has both A and B as its
predecessor.
(4) Just like we can use auxiliary data structures like bitmaps to
figure out reachability without following all the links in the
commit history topology, we should be able to learn how a new
change was born, and trace how it evolved into newer iteration
of the moral equivalent of the change, possibly as a series
with mutiple commits, using auxiliary data structure, which
would represent predecessor-successor NxM transitive relation
in a similar way in a form that is efficient to access.
Something like this should allow us avoid relying on "change ID"s
that can collide elsewhere in the world without having a central
authority to assign them.
This is similar in spirit to the "git evolve" proposal [1]. One of the
objections to that was that it required all of the rewritten commits to
be pushed back to the remote, rather than just the current version. So
if I rewrite a branch three times and push the result for review all of
the intermediate state gets pushed as well. That is because the
intermediate commits were needed to track the chain of rewritten commits
to avoid the problem Elijah described [2] when trying to follow
cherry-picked-from trailers. If the predecessor information was stored
separately to the commit it refers to (in a notes ref for example) then
we could in principle simplify the chain of rewrites when pushing so
that we only need to push the final version of the commit and a mapping
from the version that we fetched from the remote.
Tracking predecessors as you describe is certainly a more complete
solution to tracking the evolution of commits and it addresses the
shortcomings of change-ids you outlined in your previous mail. It is a
lot more work to implement though.
Best Wishes
Phillip
[1]
https://lore.kernel.org/git/pull.1356.git.1663959324.gitgitgadget@xxxxxxxxx/
[2]
https://lore.kernel.org/git/CABPp-BECTrVp9X6bVmzU8LEeYsC3KbzeJvAaDPN+FgZz_uEhmA@xxxxxxxxxxxxxx/