Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer

Patrick Steinhardt <ps@xxxxxx> · Thu, 3 Apr 2025 11:09:27 +0200

Hi Martin,

On Wed, Apr 02, 2025 at 11:48:01AM -0700, Martin von Zweigbergk wrote:
> Hi,
> 
> The Gerrit, GitButler, and Jujutsu projects all have a concept of
> a "change id", and it behaves in a similar way between the three
> tools. The change id is conceptually associated with a commit.
> It follows a commit as its rewritten (e.g. by amending and
> rebasing). The three projects currently store and format the
> change id differently. We would like to unify that so we can
> interoperate better. We hope the Git project is also interested
> in preserving and using this header.
> 
> There are many benefits to having a change id even if it's just
> local. I mentioned some in my email to this mailing list in [1].
> For example, it enables
> `git rebase main <change ID>; git switch <change ID>` without
> requiring the user to look up the hash of the rewritten commit.
> If the change id also transferred between repos and preserved by
> a forge (such as Gerrit), it enables the change id to be used to
> identify a code review.

Agreed, change IDs solve a couple of issues that many users face:

  - You can reliably track how a patch evolves over time. This helps
    various different tools to track identity of commits, like for
    example forges, but also tools like git-range-diff(1).

  - It becomes trivial to see whether a commit has been cherry-picked
    into another branch. We do have git-cherry(1) to do that right now,
    but that command is based on heuristics and fails as soon as the
    patch itself needed to be adapted.

  - Working with history rewrites becomes easier in the general case as
    you don't have to adapt to constantly changing commit IDs.

The mere fact that different tools eventually ended up with similar
designs around change IDs is a good indicator that there is a real need
for them out there.

> Here's how the change ids are currently stored and formatted:
> 
>  * Gerrit currently stores change ids in a commit trailer called
>    `Change-Id`. It always starts with the letter 'I' and is
>    followed by 40 hex digits. For example:
>    `Change-Id: Ib563e78c3fedcff262255fa025441daa3202311b`.
> 
>  * GitButler currently stores change ids in a commit footer
>    called `gitbutler-change-id` (older versions used
>    `change-id`). It's written as 32 hex digits separated by
>    dashes as in the UUID  format. For example:
>    `gitbutler-change-id  7d0fbc63-032d-413c-8ae8-610fbeb713c0`.
> 
>  * Jujutsu currently stores change ids in a local storage outside
>    of the Git repo and is therefore not part of the Git commit
>    id. It is stored as 16 bytes. It is rendered to the user as
>   "reverse hex" using 'z' through 'k' as hex digits ('z' = 0,
>   'k' = 15). This allows even short prefixes to be distinguished
>    from commit  ids, which is a very useful property when used in
>    the CLI.
> 
> As mentioned, the three projects would like to use the same
> storage and format. I think we have a consensus to store it in a
> Git commit header called `change-id` as a 32 reverse-hex digis.
> For example: `change-id ywlktllmukprnxnmzzprukpuwyztylwt`.

I don't mind the actual format too much at this point, so I won't
comment on this part.

> There is a design doc [2] about the impact on Gerrit and how to
> handle various cases where the client doesn't understand the
> `change-id` header. That also includes some discussion about
> whether cherry-picking should preserve the change id or create a
> new one. I think there is a lot of value in having a
> standardized header regardless of what we decide about
> cherry-picks.
> 
> So, to be clear, this is mostly a heads up at this point; we don't
> depend on any immediate changes from the Git project.

Scott has already been reaching out to me before your mail, and I also
mentioned to him that I have been thinking about the problem of change
IDs for quite a while already. This has mostly been triggered by Jujutsu
and how it uses change IDs, which is one of the good improvements over
Git from my perspective.

While there may not be a need to do anything in Git itself I would think
that supporting change IDs natively in Git would still be sensible.
Sure, you can emulate them via commit trailers. But I don't consider
trailers to be particularly great as a storage format for this metadata.
After all, you will want to filter the commit graph by change ID for
some of the usecases, and doing that based on a loosely-defined format
probably isn't great.

So what would it take to get change IDs into Git? I think the most
important items would be:

  - Generating and writing change IDs in commands that support them.
    This includes e.g. git-commit(1), git-commit-tree(1), git-merge(1),
    git-merge-tree(1). This should of course be completely optional and
    probably be disabled by default.

  - Making tools that rewrite commits aware of change IDs so that they
    know to retain change IDs. This involves e.g. git-cherry-pick(1),
    git-rebase(1), git-replay(1).

  - Extending revisions to allow specifying commits by change ID.

  - Allowing us to filter commit graphs by change ID.

I don't think any of these should be particularly hard to do. Sure,
addressing and filtering commits by change IDs would be slowish at first
because we have to basically read all commits, but this is something
that can be sped up via indices.

The biggest question is of course backwards compatibility -- can we
introduce a change ID into the commit metadata without breaking existing
users? I guess you'll already have a lot of experience with this given
that you essentially already inject change IDs into metadata, and tools
generally handle this just fine?

I'd certainly be happy to help out with an effort to introduce change
IDs into Git if the community is amenable to such a proposal.

Patrick

NB: I'm also quite happy that Jujutsu brings a bit of a new contender
    to Git into the picture. It has a lot of nice ideas, and in the best
    case Git might be able to learn a few nice tricks from JJ. After
    all, I think we can all benefit from some friendly competition.