Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer

"Theodore Ts'o" <tytso@xxxxxxx> · Tue, 8 Apr 2025 08:55:21 -0400

On Mon, Apr 07, 2025 at 04:36:01PM -0500, Nico Williams wrote:
> This is why I suggested earlier that there need to be multiple change
> IDs, not just one.  Perhaps one is a "code review ID" and another is
> a "commit change ID".  The code review ID would let you link together
> all commits that were reviewed together, so if you have to split or
> squash commits they would all still have that one code review ID.  The
> commit change ID would be shared by all sufficiently-similar versions of
> a commit.  If a commit is dropped or split or squashed then its commit
> ID might get dropped too, but the code review ID would stay the same.

I think "code review ID" makes a lot of sense, although what I would
call it is "patch series ID".  This has very clear semantic: it ties
commits which should be grouped together as a single higher-level set
of changes.  It could be used by "git format-patch" / "git send-email"
to automatically send a group of patches as a logical unit.

I'd include the "patch series ID" in the e-mail that gets sent out, so
that "git apply-message" would be able to retain the patch series ID.
Patchwork could use the Patch series ID to automatically mark a v2
version of patch series as obsoleting the v1 version of the patch
series.  So it would be a lot more useful for than just for
Gerrit-style workflows, and that's a good sign that feature makes
sense from a design perspective.

I'll note that even without the "commit change ID", just simply
knowing that one patch series is a newer version of a pre-existing
patch series is enough to allow Gerrit to intuit which commit is a
newer version of another commit.  For singleton commits, nothing else
is necessary.  For multi-commit patch series, gerrit could use the
one-line commit description to associate commits; it could use
ordering of the patches; it could just see which commit contents are
similar to previous commits, much like how git detects renames.

In my experience looking at how kernel developers use gerrit versus
e-mail workflows, in general, gerrit patch series tend to involve a
smaller number of commits, because looking at how various files change
between commtis is awkward; and with e-mail workflows, the patch
series tend involve a larger number of commits, because reviewing
smaller commits is easier with e-mail.

So if this true for other communities using web-based review
workflows, using an hueristics instead of a "commit change ID" might
be sufficient --- and for those communities that run into problems,
they could continue to use a gerrit-style "Change-ID: " in the footer,
with the hueristics being used if for some reason commits that don't
have the Change-ID make it into Gerrit.

> > Quite frankly, I think the concept of "change ID" is nice but it is
> > not mechanically trustable.  Recording them in the trailers is fine,
> > but I somehow feel that they have a clear-cut semantics everybody
> > can agree on to deserve to be in the header part of commit objects.
> 
> I don't think they need to have such extremely detailed semantics in
> order to be able to get a header.  The semantics will ultimately be
> somewhat project-defined, typically something like "during code review
> you can use these to related newer updates to an MR/PR/CR to older
> versions" and "once integrated you can use these to find the approved
> code review as follows [details]".  The [details] (probably a URI
> template) for finding concluded CRs might vary.  The CR tool might vary.
> The construction of the change IDs might vary.  The intent might not
> vary at all.

I disagree.  From long experience, allowing something into an
interface that doesn't have strongly defined semantics has lead to
*huge* problems.  This has certainly been the case for
Kernel<->Userspace interfaces; so my bias is that if we can't define
strong semantics, then we should probably avoid adding that interface
until we can.  Otherwise, this can lead to a huge number of headaches,
both for developers and users.

People *will* develop automation tools suing an official "commit
change ID", assuming that how their project (or their forge site) uses
the ill-defined Change ID is the One True Way that the badly defined
field should be used.  And other people will developer *other* tools
assuming some other interpreation for that field.  And then the git
developers and users will be left trying to pick up the pieces.

Cheers,

						- Ted