Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer

"Theodore Ts'o" <tytso@xxxxxxx> · Wed, 9 Apr 2025 08:19:24 -0400

On Tue, Apr 08, 2025 at 10:53:06AM -0500, Nico Williams wrote:
> I'm not keen on CR tools "intuiting" from.. similarity checks.  I don't
> love Git's similarity checks for file renames.  I get that for a
> distributed VCS assigning something like "inode numbers" is tricky, but
> as long as devs don't race to create the same files it was always
> possible to have UUIDs as "inode numbers" and avoid the similarity
> checks.

I'm not keen on fields that can have essentially random semantics.
Part of this is because today Change-ID is in the footer, and so
humans can randomly set it to any value they like.  Sometimes they cut
and paste footers, and so completely unrelated commits have the same
Change-Id which show up when you do a Gerrit lookup by Chnage-Id.
Admittedly, this aspect gets better if we shove it into the git commit
header.

Part of it is because some tools will edit the Change-Id when doing a
cherry-pick.  (For example, one tool that I'm familiar which is a CLI
front-end to Gerrit, when you run the command "kdt cherry-pick", will
unconditionally edit the Change-Id to a completely new value) --- and
some will not, because they are just do a "git cherry-pick" without
doing anything else.  And if you live in an ecosystem where some
poeple use "git cherry-pick", and other people do "kdt cherry-pick",
you basically have *no* guarantees about how Change-Id might behave
for different commits.  This *might* get better if we shove it into a
git commit header, although if you give people tools to edit the
Change-Id as part of a "git commit --amend", some tools might end up
changing the Change-Id in random ways again.

But then we have the problem where if patches get merged or split,
what Change-Id is really undefined today.  I could imagine that if a
commit gets split, both descedent commits should retain the same
Change-Id.  Or maybe if a patch stack gets collapsed, all of the
predecessor Change-Id should be included in that collapsed commit,
much like how an "Octopus Merge" might have a half-dozen or more
parent commits.  Defining the semantics here is part of the battle;
the other part of the battle would be how would the tools make sure
these semantics get obeyed.

Perhaps one approach might be that the hueristics that you hate being
used as an automated way to sort it out, might get used to set the
semantics at commit time, with perhaps a way for the user to override
the hueristics, or where the user has to explicitly acknowledge that
the hueristics correctly noticed that the patch has changed radically
and maybe the Change-Id shouldn't be retained any more?

Finally, perhaps there should be some discussion about whether we
think git should be maintaining indexes based on the Commit-Id.
Personally, cutting and pasting a random 17 character ID is painful
and annoying, and when I see it in my shell history, I have no idea
what might have been going on.  So if I need to cut and paste a
Commit-Id, I might as well cut and paste the one-line commit summary,
and do a "git log --grep" search based on that.  But if the Commit-Id
is indexed, then maybe it might be more useful?  I dunno....

> So how much of the [details] do you want specified?  If you want to be
> able to go from "change ID" to CR generically for all CR tools then the
> the best -and perhaps only reasonable- way is to make the change ID a
> URI.  Or if you think the [details] can be elided and still have
> semantics that are well-defined enough then I think you agree with me
> more than you disagree :)

Well, see above about some possible semantics.  I'm *still* not
convinced even with the better-defined semantics it's worth storing
the extra baggage in the commit header.  But that's more of a
value/philosophical question, much like how we "could" store explicit
file rename information in the git commit, but in the very early days
of the git design history, although BitKeeper did track file names,
Linus consciously decided to go down a much simpler path.  So that's
really more of a SMTP vs X.400 preference of simplicity versus
complexity in the protocol versus implementation, which is something
where people of good will might disagree --- and there Junio's
opinions matter far more then mine.  :-)

Cheers,

					- Ted