Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer

Elijah Newren <newren@xxxxxxxxx> · Thu, 3 Apr 2025 19:40:36 -0700

On Thu, Apr 3, 2025 at 7:28 PM Elijah Newren <newren@xxxxxxxxx> wrote:
>
> On Thu, Apr 3, 2025 at 9:40 AM Remo Senekowitsch <remo@xxxxxxxxxxx> wrote:
> >
> > On Thu Apr 3, 2025 at 5:39 PM CEST, Elijah Newren wrote:
> > > On Wed, Apr 2, 2025 at 11:48 AM Martin von Zweigbergk
> > > <martinvonz@xxxxxxxxxx> wrote:
> > >>
> > >> There are many benefits to having a change id even if it's just
> > >> local. I mentioned some in my email to this mailing list in [1].
> > >> For example, it enables
> > >> `git rebase main <change ID>; git switch <change ID>` without
> > >> requiring the user to look up the hash of the rewritten commit.
> > >
> > > But <change ID> isn't unique, right?  The whole point of having the
> > > change ID is to preserve it despite edits (e.g. rebase, commit
> > > --amend, cherry-pick), meaning that you end up with multiple commits
> > > with the same <change ID>.
> > >
> > > Why would this work?
> > >
> > > And if it does work, isn't it expensive since you'd need to walk
> > > history to find it?  Or do you keep an extra lookup table on the side
> > > somewhere?
> >
> > For rebase and commit --amend, the way Jujutsu deals with those is that
> > all descendants are immediately rebased on top of the new commit, and
> > refs to those descendants are updated as well. That means, the old
> > version of the patch with the same change-id becomes unreachable. So,
> > at least most of the time, the change-id is indeed unique.
> >
> > This doesn't work for cherry-pick, more on that below.
> >
> > Some of these features are not in Git yet, at least not to my knowledge.
> > That means getting the full benefit of change-ids with Git itself
> > would indeed require some more work. I know of rebase.updateRefs
> > and rebase.rebaseMerges, which move the Git experience closer to
> > Jujutsu, but don't go all the way. AFAIK it's not possible with Git to
> > automatically rebase --update-refs all descendants of a commit that is
> > amended or rebased.
>
> Correct; that doesn't exist currently.
>
> > Jujutsu does keep a separate index of change-ids, yes.
>
> Thanks.
>
> > >> There is a design doc [2] about the impact on Gerrit and how to
> > >> handle various cases where the client doesn't understand the
> > >> `change-id` header. That also includes some discussion about
> > >> whether cherry-picking should preserve the change id or create a
> > >> new one. I think there is a lot of value in having a
> > >> standardized header regardless of what we decide about
> > >> cherry-picks.
> > >
> > > cherry-pick & rebase preserve author name, email & time, while
> > > creating a new committer name, email, & time.  To me, the change-id is
> > > about the authorship, and since these commands already preserve
> > > authorship, it'd seem weird to me to have cherry-pick not preserve the
> > > change-id by default.
> >
> > I'd say Jujutsu, Gerrit and GitButler think of a change-id as associated
> > with a unit of review. (Although it will naturally support reviewing
> > sets of patches as well.) Usually only one person will push commits with
> > the same change-id, just like people don't usually force-push over each
> > others branches. But that's mostly about avoiding logistical problems.
> > When an employee leaves a company or is on vacation, it can be perfectly
> > reasonable for someone else to take over their work. In that case, it
> > would be appropriate to preserve the change-id, even though authorship
> > has changed, because the history of code review on that patch should
> > stay associated with the new version.
> >
> > Cherry-picking on the other hand often represents a separate unit of
> > review. That review may revolve around whether it makes sense to
> > backport a bugfix at all or any additional changes that may have been
> > necessary to make the bugfix work in the different, older codebase.
>
> I've worked with many projects hosted in Gerrit, and they all had a
> very different view of change-ids than what you've espoused here.
> They cherry-picked changes to other branches, fully expecting the
> change-id to be kept the same.  They often checked to verify that
> important fixes had been backported to all the relevant LTS branches
> by looking for the change-id.  So, we'd typically have N+1 commits
> sharing the same change-id, all reachable from existing branches,
> where N is the number of LTS versions still supported at the time (and
> the +1 comes from the main branch development).
>
> > As mentioned above, there's also the issue that preserving the change-id
> > on cherry-pick likely results in duplicates. For Jujutsu, it would be
> > nice it this was avoided. But it's not infeasible to deal with that
> > either.
> >
> > For Gerrit, it would be important to be able to track a change across
> > cherry-picks somehow, since that is a feature they already have. If Git
> > decides to preserve the change-id on cherry-pick, there's no problem
> > for Gerrit. Alternatives include storing a separate cherry-picked-from
> > header or enabling the -x flag on cherry-pick by default.
>
> Cherry-picked-from trailers can be nice when it exists, but much more
> frequently than one would want it provides a dead-end.  People will
> cherry-pick a commit that was local-only, or only found in some
> security-embargoed repository, and you'd end up with dead ends.  You
> also occasionally get chains: E cherry-picked from D, which was
> cherry-picked from C, which was cherry-picked from B, etc.  And more
> complex structures are possible.  And maybe part of that chain was a
> local-only commit or some commit from a security-embargoed repository
> that you don't have access to.  Then folks get to write scripts and
> try to deduce relationships from those trailers (e.g. hey, these two
> commits both claim they were cherry-picked from the same non-existent
> commit, and this other commit was a cherry-pick of one of these two,
> so they're a representation of the same logical change on these
> different LTS branches).  It makes it a hassle to try to determine
> which LTS branches have the appropriate fixes backported and applied.
> I've done it, but I thought this problem was logically the point of
> change-ids as found in Gerrit, honestly (well, that and its byzantine
> push to refs/for/$BRANCH stuff so it could automagically determine
> which CR that your push was supposed to be correlated with instead of
> just letting you specify via a real refname in your push command).
> While I understand that having nearly-unique change-ids let you use
> change-ids interchangably with commits, that seems like a questionable
> benefit over being able to actually track which logical changes are
> the same and have been applied to which LTS branches.  I fully realize
> folks may disagree...but if we're suggesting commands like `git switch
> <change-id>` which can only possibly be meaningful if <change-id> is
> unique across all branches, then what are we supposed to do for the
> many projects which use change-ids for LTS backport tracking?  What
> does `git switch <change-id>` (and any other command where you attempt
> to use a non-unique change-id in place of a unique commit identifier)
> do for them?

One possible simple solution here is just to treat change-ids (or
there abbreviations) kind of like abbreviated hashes -- they aren't
guaranteed to be unique.  If the user specifies a change-id and there
are multiple branches with such a change-id, we provide the user an
error much like we do for abbreviated hashes.

Is that what folks have in mind?  If so, I'll be happy to drop my
reservations about this aspect.