Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer

"Remo Senekowitsch" <remo@xxxxxxxxxxx> · Thu, 03 Apr 2025 12:38:52 +0200

Hi Patrick,

On Thu Apr 3, 2025 at 11:09 AM CEST, Patrick Steinhardt wrote:
> On Wed, Apr 02, 2025 at 11:48:01AM -0700, Martin von Zweigbergk wrote:
>>
>> As mentioned, the three projects would like to use the same
>> storage and format. I think we have a consensus to store it in a
>> Git commit header called `change-id` as a 32 reverse-hex digis.
>> For example: `change-id ywlktllmukprnxnmzzprukpuwyztylwt`.
>
> I don't mind the actual format too much at this point, so I won't
> comment on this part.

Gerrit and GitButler also did not mind the format, which is why they
agreed to adopt the one of Jujutsu. There is also no technical reason
why Jujutsu wouldn't be able to support a free-form id. However,
discussing a standard for the ecosystem gives us the opportunity to
pick something that everybody can rely on and benefit from.

Some benefits of the proposed format include:
- known memory requirement
- change-id as part of a URL never has to be escaped
- it being a hash means the smallest unambiguous prefix is minimized

So, these are mostly practical considerations. If there are notable
benefits to free-form IDs, Jujutsu can hash that again to get an ID in
its internal format if necessary. But it's always easier to go from a
strict format to a loose one later, as opposed to the other way around.

> While there may not be a need to do anything in Git itself I would think
> that supporting change IDs natively in Git would still be sensible.
> Sure, you can emulate them via commit trailers. But I don't consider
> trailers to be particularly great as a storage format for this metadata.
> After all, you will want to filter the commit graph by change ID for
> some of the usecases, and doing that based on a loosely-defined format
> probably isn't great.
>
> So what would it take to get change IDs into Git? I think the most
> important items would be:
>
>   - Generating and writing change IDs in commands that support them.
>     This includes e.g. git-commit(1), git-commit-tree(1), git-merge(1),
>     git-merge-tree(1). This should of course be completely optional and
>     probably be disabled by default.
>
>   - Making tools that rewrite commits aware of change IDs so that they
>     know to retain change IDs. This involves e.g. git-cherry-pick(1),
>     git-rebase(1), git-replay(1).
>
>   - Extending revisions to allow specifying commits by change ID.
>
>   - Allowing us to filter commit graphs by change ID.

I agree with all of that. The first two points are the ones that would
actually allow the ecosystem to start relying on this new header as a
standard and develop related features while staying interoperable with
the rest of the ecosystem. E.g. if the header is preserved by
git-rebase, Git & Jujutsu users will enjoy stable change-ids when a
branch is rebase-merged on a forge. And if git generated the header
itself with git-commit, Gerrit could drop its requirement for clients
to generate a change-id footer via their commit-msg hook.

> The biggest question is of course backwards compatibility -- can we
> introduce a change ID into the commit metadata without breaking existing
> users? I guess you'll already have a lot of experience with this given
> that you essentially already inject change IDs into metadata, and tools
> generally handle this just fine?

Jujutsu has been injecting a 'jj:trees' header into commits to track
more metadata around merge conflicts. There weren't any problems with
that, unless one uses git to rewrite these commits with e.g. git-rebase,
in which case that header is simply lost. But commits with conflicts are
usually not pushed to a remote anyway, so the risk there was minimal.
Scott Chacon with GitButler has more experience in this regard, since
they actually push commits with a change-id in its header to remotes.
He told the Jujutsu community that they didn't encounter any problems,
no misbehaving tools that are fussy about unknown headers. The only
problem is unknown commit headers being dropped by Git itself, depending
on how it is invoked by the remote. (GitHub seems to preserve the header
during a rebase-merge, because they use git-replay. GitLab and Forgejo
drop the header.) With these insights from Scott, Jujutsu is moving
forward to put the change-id in the commit header.

> NB: I'm also quite happy that Jujutsu brings a bit of a new contender
>     to Git into the picture. It has a lot of nice ideas, and in the best
>     case Git might be able to learn a few nice tricks from JJ. After
>     all, I think we can all benefit from some friendly competition.

One of the best features of Jujutsu is that it plays nice with Git.
Most users work in "colocated" repos that have both a .git and a .jj
directory. Git commands continue to work as usual (mostly). If Git
adopted the change-id header, interleaving of Git and Jujutsu commands
would work even better.

So, +1 from me for friendly competition / collaboration. :-)

Remo