Re: Semantics of change IDs (Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer)

"Remo Senekowitsch" <remo@xxxxxxxxxxx> · Wed, 23 Apr 2025 00:24:06 +0200

On Tue Apr 22, 2025 at 10:17 PM CEST, D. Ben Knoble wrote:
> On Wed, Apr 16, 2025 at 7:36 AM Remo Senekowitsch <remo@xxxxxxxxxxx> wrote:
>>
>> On Mon Apr 14, 2025 at 9:54 PM CEST, D. Ben Knoble wrote:
>> > On Wed, Apr 9, 2025 at 12:56 PM Nico Williams <nico@xxxxxxxxxxxxxxxx> wrote:
>> >> Let's nail down the semantics of these change ID headers.  Here is a
>> >> proposal to bang on:
>> >>
>> >>  - change IDs get preserved on cherry-pick and on `pick`s in rebases
>> >>
>> >>  - users can manually remove or change these change IDs, naturally,
>> >>    though generall they would not
>> >>
>> >>  - the actual change IDs are either free-form or they are URIs -- pick
>> >>    one, but if they are URIs they should be URIs to CRs, and approved
>> >>    CRs should perhaps have links to integration reports etc.
>> >
>> > Using URIs [to code reviews] looks to me like it makes some
>> > assumptions about what creates or consumes these headers, right?
>> > Especially since the URI should point to a code review… Is there a way
>> > to do that which is downstream-agnostic?
>> >
>> > Further, and maybe this is my ignorance of Gerrit showing: how would
>> > you attach a URI to a local commit when authoring it? You don't have
>> > the review URI when running `git commit`, do you? (Maybe I
>> > misunderstood; I'm seeing an odd chicken-egg problem here.)
>> >
>> > Which begs another question: what/who applies the initial change ID to
>> > a commit and when?
>>
>> These are all great questions, which the originally proposed format
>> (fixed-width reverse-hex) has answers to. I think a URI would be
>> strictly worse.
>
> Well, I think we still missed "what/who applies the initial change ID
> to a commit and when."

The tool that creates the commit, when it creates the commit. In the
context of this discussion, that's Git or Jujutsu. If the commit is
brand new, generate the change-id randomly. If it's the "spiritual
successor" of another commit, use that commit's change-id.

Btw. since the thread was started, the implementation in Jujutsu has
been completed and I've been pushing commits with the change-id header
to various remotes for a while now. It works well. Forges can start
taking advantage of it. (I hope I find time to help work on that.)

> But the treatment below is something I agree with and failed to
> convey, I think: namely, URIs seem to encode too much
> "unportable"/"specific" information in Git. I feel like the current
> design is not really "tool-agnostic" as much as "built on a universal
> core." That seems valuable and prone to more longevity.
>
>>
>> * Using a reverse-hex ID makes no assumptions about what consumes these
>>   headers. There can be multiple different consumers which treat the ID
>>   differently with different URI schemes.
>>
>> * Attaching a reverse-hex ID to a local commit when authoring it is
>>   trivial: you generate it randomly.
>>
>> This is one of those cases where being maximally restrictive about the
>> format will enable maximal flexibility downstream.
>>
>> One example: GitHub has a URL scheme that looks like this:
>> github.com/org/repo/compare/<ref1>..<ref2>
>>
>> This doesn't work if the refs contain slashes, as branches sometimes do
>> (e.g. feat/foo, username/bar). If the change-id is a URI, this type of
>> URL scheme doesn't work reliably.
>>
>> That is not to say we should design the change-id around GitHub, it's
>> just an example how making the format more free-form (URI is more
>> free-form than fixed-width reverse-hex) makes it more difficult to get
>> stuff working downstream.
>>
>> And lastly, laser-etching the URI scheme of one particular tool into
>> your commit history means the history is at great risk of degrading
>> over time. URI schemes change, domains change, tools become outdated
>> and are replaced.
>>
>> Adding some ephemeral configuration to a tool that constructs a URI out
>> of a reverse-hex ID on the other hand is trivial.
>
> Yep.
>
>> > PS This discussion feels somewhat related to the classic GitHub
>> > problem of not presenting interdiffs/range-diffs: GitHub shows a
>> > too-flat source diff on force-pushes. Perhaps better web UI tooling
>> > about interdiff review (which I think is one of the things Gerrit
>> > does/wants to do?) makes change IDs less necessary, since interdiffs
>> > help connect evolutions of commits?
>>
>> I think it's the other way around: Building a code review UI built on
>> git and centered around interdiffs today is _hard_, that's why we don't
>> have it yet. Adding change-ids to commits will make it much easier,
>> paving the way for these tools to be implemented.
>
> Fair point, although GitHub's detection of force-pushes makes me think
> it could split a PR into versions at that point, cross-link backwards
> and forwards by one version (from the force-push detection), show
> range-diffs between versions based on the target branch of the merge,
> and even follow the cross-links to show an overall sequence of
> versions.
>
> But I don't work there, so presumably it's harder than that :)

I agree that forges still have a lot of potential to improve their code
review UIs even without change-ids. But tracking individual patches
across force-pushes is not as easy as calling git range-diff. Its
manpage says the output is not stable for machines to read and the
algorithm it uses has cubic runtime complexity. Not something I'd want
to use on my backend if the user controls the input. So the next best
thing is to reimplement it using cheaper (but worse) heuristics. The
author timestamp would probably be a good start. But at that point,
you're looking at a lot of work for what users will perceive as an
unreliable and inconsistent experience.

> I sincerely hope to make it easier if it's really that hard! And,
> though my opinion matters little, I'm having a hard time piercing the
> conversation to see a "universal core" that solves the desired
> problem. Maybe I'm not reading carefully enough, and maybe a summary
> would help. I greatly appreciated the work of previous folks to
> summarize the current thread status.

--
Best regards,
Remo