Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer

Elijah Newren <newren@xxxxxxxxx> · Thu, 3 Apr 2025 19:28:39 -0700

On Thu, Apr 3, 2025 at 9:40 AM Remo Senekowitsch <remo@xxxxxxxxxxx> wrote:
>
> On Thu Apr 3, 2025 at 5:39 PM CEST, Elijah Newren wrote:
> > On Wed, Apr 2, 2025 at 11:48 AM Martin von Zweigbergk
> > <martinvonz@xxxxxxxxxx> wrote:
> >>
> >> There are many benefits to having a change id even if it's just
> >> local. I mentioned some in my email to this mailing list in [1].
> >> For example, it enables
> >> `git rebase main <change ID>; git switch <change ID>` without
> >> requiring the user to look up the hash of the rewritten commit.
> >
> > But <change ID> isn't unique, right?  The whole point of having the
> > change ID is to preserve it despite edits (e.g. rebase, commit
> > --amend, cherry-pick), meaning that you end up with multiple commits
> > with the same <change ID>.
> >
> > Why would this work?
> >
> > And if it does work, isn't it expensive since you'd need to walk
> > history to find it?  Or do you keep an extra lookup table on the side
> > somewhere?
>
> For rebase and commit --amend, the way Jujutsu deals with those is that
> all descendants are immediately rebased on top of the new commit, and
> refs to those descendants are updated as well. That means, the old
> version of the patch with the same change-id becomes unreachable. So,
> at least most of the time, the change-id is indeed unique.
>
> This doesn't work for cherry-pick, more on that below.
>
> Some of these features are not in Git yet, at least not to my knowledge.
> That means getting the full benefit of change-ids with Git itself
> would indeed require some more work. I know of rebase.updateRefs
> and rebase.rebaseMerges, which move the Git experience closer to
> Jujutsu, but don't go all the way. AFAIK it's not possible with Git to
> automatically rebase --update-refs all descendants of a commit that is
> amended or rebased.

Correct; that doesn't exist currently.

> Jujutsu does keep a separate index of change-ids, yes.

Thanks.

> >> There is a design doc [2] about the impact on Gerrit and how to
> >> handle various cases where the client doesn't understand the
> >> `change-id` header. That also includes some discussion about
> >> whether cherry-picking should preserve the change id or create a
> >> new one. I think there is a lot of value in having a
> >> standardized header regardless of what we decide about
> >> cherry-picks.
> >
> > cherry-pick & rebase preserve author name, email & time, while
> > creating a new committer name, email, & time.  To me, the change-id is
> > about the authorship, and since these commands already preserve
> > authorship, it'd seem weird to me to have cherry-pick not preserve the
> > change-id by default.
>
> I'd say Jujutsu, Gerrit and GitButler think of a change-id as associated
> with a unit of review. (Although it will naturally support reviewing
> sets of patches as well.) Usually only one person will push commits with
> the same change-id, just like people don't usually force-push over each
> others branches. But that's mostly about avoiding logistical problems.
> When an employee leaves a company or is on vacation, it can be perfectly
> reasonable for someone else to take over their work. In that case, it
> would be appropriate to preserve the change-id, even though authorship
> has changed, because the history of code review on that patch should
> stay associated with the new version.
>
> Cherry-picking on the other hand often represents a separate unit of
> review. That review may revolve around whether it makes sense to
> backport a bugfix at all or any additional changes that may have been
> necessary to make the bugfix work in the different, older codebase.

I've worked with many projects hosted in Gerrit, and they all had a
very different view of change-ids than what you've espoused here.
They cherry-picked changes to other branches, fully expecting the
change-id to be kept the same.  They often checked to verify that
important fixes had been backported to all the relevant LTS branches
by looking for the change-id.  So, we'd typically have N+1 commits
sharing the same change-id, all reachable from existing branches,
where N is the number of LTS versions still supported at the time (and
the +1 comes from the main branch development).

> As mentioned above, there's also the issue that preserving the change-id
> on cherry-pick likely results in duplicates. For Jujutsu, it would be
> nice it this was avoided. But it's not infeasible to deal with that
> either.
>
> For Gerrit, it would be important to be able to track a change across
> cherry-picks somehow, since that is a feature they already have. If Git
> decides to preserve the change-id on cherry-pick, there's no problem
> for Gerrit. Alternatives include storing a separate cherry-picked-from
> header or enabling the -x flag on cherry-pick by default.

Cherry-picked-from trailers can be nice when it exists, but much more
frequently than one would want it provides a dead-end.  People will
cherry-pick a commit that was local-only, or only found in some
security-embargoed repository, and you'd end up with dead ends.  You
also occasionally get chains: E cherry-picked from D, which was
cherry-picked from C, which was cherry-picked from B, etc.  And more
complex structures are possible.  And maybe part of that chain was a
local-only commit or some commit from a security-embargoed repository
that you don't have access to.  Then folks get to write scripts and
try to deduce relationships from those trailers (e.g. hey, these two
commits both claim they were cherry-picked from the same non-existent
commit, and this other commit was a cherry-pick of one of these two,
so they're a representation of the same logical change on these
different LTS branches).  It makes it a hassle to try to determine
which LTS branches have the appropriate fixes backported and applied.
I've done it, but I thought this problem was logically the point of
change-ids as found in Gerrit, honestly (well, that and its byzantine
push to refs/for/$BRANCH stuff so it could automagically determine
which CR that your push was supposed to be correlated with instead of
just letting you specify via a real refname in your push command).
While I understand that having nearly-unique change-ids let you use
change-ids interchangably with commits, that seems like a questionable
benefit over being able to actually track which logical changes are
the same and have been applied to which LTS branches.  I fully realize
folks may disagree...but if we're suggesting commands like `git switch
<change-id>` which can only possibly be meaningful if <change-id> is
unique across all branches, then what are we supposed to do for the
many projects which use change-ids for LTS backport tracking?  What
does `git switch <change-id>` (and any other command where you attempt
to use a non-unique change-id in place of a unique commit identifier)
do for them?