Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer

"Kristoffer Haugsbakk" <kristofferhaugsbakk@xxxxxxxxxxxx> · Wed, 14 May 2025 17:08:53 +0200

What a thread!

On Tue, Apr 8, 2025, at 16:27, Junio C Hamano wrote:
> Martin von Zweigbergk <martinvonz@xxxxxxxxxx> writes:
>
>>> A set of individual commits that share the same "change ID" is,
>>> unlike reflog entries which is an ordered set of tip of topics, not
>>> inherently ordered.  This is inevitable in the distributed world
>>> where many people can simultaneously work on improving a single
>>> "change" in many different ways, but making it difficult if not
>>> impossible to see how things evolved, simply because you first need
>>> to figure out the order of these commits that share the same "change
>>> ID".  Some may be independently evolved from the same ancestor
>>> iteration.  Some may be repeatedly worked on on a single strand of
>>> pearls (much like how development recorded in reflog entries of a
>>> single branch in a single user set-up goes).  I guess you would need
>>> a way to record the predecessor vs successor relationship of various
>>> commits that share the same "change ID", much like commits form DAG
>>> to represent ancestor vs descendant relationship.
>>
>> That is correct. The change ID should be sufficient for handling
>> simple distributed cases involving a single remote but it's not a full
>> replacement for something like Mercurial's Changeset Evolution [1].
>
> Just a random thought.  We could very easily replace "change ID"
> with a concept of predecessor-successor commits.
>
> Just like we can represent parents-children NxM transitive relation
> only with 0 or more "parent" commit object headers, we can record
> zero or more "predecessor" trailer in the commit log.
>
>  (1) a commit with no "predecessor" is like "root commit" in the
>      commit history topology.  It is a brand new change that took
>      inspiration from nobody else and that is not a polished form of
>      any other existing commit.
>
>  (2) a commit created as a refinement for one or more existing
>      commits record each of them as "predecessor" to it.  Having
>      more than one of them is like a "merge commit" in the commit
>      history topology and represents that two patches were squashed
>      into one.
>
>  (3) Splitting an originally large change into multiple changes can
>      be represented the same way.  They share the same commit as
>      their "predecessor".  Perhaps you have originally two-commit
>      series, A and B, and split them differently in such a way that
>      C has half of a and D has the rest of A plus B.  In which case,
>      C has A as its predecessor while D has both A and B as its
>      predecessor.
>
>  (4) Just like we can use auxiliary data structures like bitmaps to
>      figure out reachability without following all the links in the
>      commit history topology, we should be able to learn how a new
>      change was born, and trace how it evolved into newer iteration
>      of the moral equivalent of the change, possibly as a series
>      with mutiple commits, using auxiliary data structure, which
>      would represent predecessor-successor NxM transitive relation
>      in a similar way in a form that is efficient to access.
>
> Something like this should allow us avoid relying on "change ID"s
> that can collide elsewhere in the world without having a central
> authority to assign them.

I have a few submissions where I recorded the commit hash and the
previous commits in the email headers.

https://lore.kernel.org/git/0ab05a4cf09ba02016b4493936ad1b092b1326aa.1730979849.git.code@xxxxxxxxxxxxxxx/

For this one (v3):[1]

```
X-Commit-Hash: 0ab05a4cf09ba02016b4493936ad1b092b1326aa
X-Previous-Commits: c50f9d405f9043a03cb5ca1855fbf27f9423c759 63a431537b78e2d84a172b5c837adba6184a1f1b
```

• `X-Commit-Hash`: my local commit for this patch
• `X-Previous-Commits`: the two previous commits (v1 and v2 in arbitrary order)

Version 1 just has the hash:

https://lore.kernel.org/git/63a431537b78e2d84a172b5c837adba6184a1f1b.1729451376.git.code@xxxxxxxxxxxxxxx/

```
X-Commit-Hash: 63a431537b78e2d84a172b5c837adba6184a1f1b
```

And v2:

https://lore.kernel.org/git/c50f9d405f9043a03cb5ca1855fbf27f9423c759.1730234365.git.code@xxxxxxxxxxxxxxx/

```
X-Commit-Hash: c50f9d405f9043a03cb5ca1855fbf27f9423c759
X-Previous-Commits: 63a431537b78e2d84a172b5c837adba6184a1f1b
```

† 1: The hash is in the message-id in my case.  But I wanted a dedicated
    field instead of taking it out of the msg id.  And the msg id makeup
    doesn’t seem documented.  I’ve already seen a thread where someone
    relied on parsing data out of the msg id until it changed from under
    them.

>  (4) Just like we can use auxiliary data structures like bitmaps to
>      figure out reachability without following all the links in the
>      commit history topology, we should be able to learn how a new
>      change was born, and trace how it evolved into newer iteration
>      of the moral equivalent of the change, possibly as a series
>      with mutiple commits, using auxiliary data structure, which
>      would represent predecessor-successor NxM transitive relation
>      in a similar way in a form that is efficient to access.

I don’t know if this is related but it would be amazing if we users
could define custom indexes on the DB.  Maybe people won’t agree on what
a change-id should mean (judging by this thread?) but with custom
indexes you could maybe get fast queries for whatever “id” you want to define.

Unrelated example: defining an index on `git patch-id --stable` for
quick *cherry* checks without making your own table with:

```
<rev list> | git diff-tree --patch --stdin \
    | git patch-id --stable
```