Re: Semantics of change IDs (Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer)

Martin von Zweigbergk <martinvonz@xxxxxxxxxx> · Sat, 10 May 2025 13:31:32 -0700

Hi,

On Sat, 10 May 2025 at 12:46, D. Ben Knoble <ben.knoble@xxxxxxxxx> wrote:
>
> On a re-read of
> https://lore.kernel.org/git/CANiSa6gwup5vXU235mG+Ybbc+P=SbwoNFEmuhg=iYu0yGvSXVA@xxxxxxxxxxxxxx/,
> I see that change IDs were motivated partly by identifying (related?)
> commits after rewrites. I can certainly see how it would be nice to
> track down how a commit I'm working on evolved; I can even imagine
> most of the problems brought up in this thread wrt splitting or
> combining commits (not to mention, say, cherry-picks where the
> committer makes non-trivial changes to the patch).
>
> There was also a note about using a change ID to identify a code
> review in supporting tools. Neat!
>
> I'll leave it to someone else to summarize the open questions? (I now
> have a few of my own about how tools in Gits ecosystem respond to…
> unexpected… headers.)
>
> In the meantime, I think I'll repost this, since I'm not sure I ever
> got clarity:
>
> Re-reading the original post [1] (which didn't mention this kind of
> ID?), I'm having a hard time seeing the problem statement. There's a
> lot said here about the specifics of the solution, and some other neat
> things it might unlock… meanwhile, I'm wondering if all the
> consternation about change IDs is because the problem being solved is
> underspecified for a core Git feature? (That might tie to Ted's
> initial concerns about semantic meaning, on which I think I concur:
> the parent and committer/author headers have unambiguous meaning to
> Git, independent of anything else.)
>
> It looks to me, an outsider, like the problem is some combination of
> "I want to track a commit's evolution" and "I want to see related
> commits in review, esp. when it's an identical and already-approved
> commit." But I might be misreading, and clarifying the problem
> statement might help bring us to a better core solution?

To me, the main benefit is being able to refer to an evolving change
by a stable ID. That enables things like `jj describe qx -m 'new
description'; jj new qx` (update commit message, then switch to it)
without having to look up the new commit ID after setting the
description. That's sufficient benefit for me, and I think most
Jujutsu users would agree. That's basically the only benefit we've
gotten from it so far since we have not started transferring it to
remotes. (There are other minor benefits like being able to highlight
to the user if they have two related commits so they may want to
delete one or somehow combine them.)

Given that we already have this stable ID, it would be nice to also
transfer it to remotes and have it be preserved by the remote,
including when the remote rewrites the commit. If we can use it for
things like identifying a code review so we don't need to link it
using a `Change-Id:` commit footer, then that's even better.

If we instead had something like Mercurial's Changeset Evolution
(explicitly recording how commits have evolved), then we could have a
similar identifier that was based on the original version of a commit.
To make lookup by this kind of change ID faster, we could have an
index from commit ID to change ID (i.e. original commit ID). This
seems to imply a commit can have 0 or 1 predecessors (0 for brand new
commits, 1 for rewrites), which is different from Mercurial's
Changeset Evolution, but not necessarily bad. For this kind of change
ID to be the same across repos, and assuming the predecessor pointer
is stored in the commit, we need to make sure to transfer all commits
back to the original commit when we push to a remote. As I think we've
talked about before here, that can be problematic because the user has
to be careful to check that the intermediate commits did not have
anything sensitive in them. It's also often wasteful to share all the
intermediate commits with other developers. Another option is to
transfer the predecessor pointer outside of the commit object. That
has its own problems, like being able to create cycles in the
predecessor graph.

>
> [1]: https://lore.kernel.org/git/xmqqh62tm5fo.fsf@gitster.g/T/#m038be849b9b4020c16c562d810cf77bad91a2c87
>
> --
> D. Ben Knoble