Re: Semantics of change IDs (Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer)

Junio C Hamano <gitster@xxxxxxxxx> · Mon, 14 Apr 2025 08:13:18 -0700

"Theodore Ts'o" <tytso@xxxxxxx> writes:

> On Fri, Apr 11, 2025 at 10:44:43AM -0700, Junio C Hamano wrote:
>> 
>> The submitting contributor must make a conscious arrangement to give
>> a "patch set ID" shared among the messages in a single iteration,
>> and everybody who are responding must make sure they do not add the
>> same ID to the messages they throw at the thread in response.  Those
>> who use format-patch and send-email can do that with convention and
>> automation and there is no reason to rely on In-Reply-To: header
>> (which may confuse the automated recipient of manually created
>> follow-up messages).
>
> So it all depends on how the patch set ID is implemented.  Here's one
> way that I had in mind.  The reason why I like like this over the
> Change-ID approach is that the semantics can be very clearly defined,
> and the only thing we rely on is the user saying "this new commit is
> part of patch series which I'm putting together". 
>
> By default when creating a new commit, the field is empty (in which
> case the patch set ID is presumed to be the same as the commit ID), or
> if the user gives a command-line flag say, "git commit --series"
> which indicates that it is part of a patch series in which case the
> patch set ID of the commit is set to the patch set ID of the current
> commit (i.e., eventully, its parent commit).
>
> Whenever the commit is amended or rebased or cherry picked, if the
> patch series ID is NULL, then it is set to the original commit ID.
> Otherwise, the existing patch set ID is preserved.
>
> The patch set ID will be output by git format-patch (perhaps as "Patch
> Series ID: sha has" immediately after the --- line.  And if it is
> present, "git am" will import that patch series ID into git commit
> which creates when it sucks in the e-mail.
>
> The net affect of this is that for new versions of git which implement
> the Patch Set ID, all new commits are treated as patch series of
> length 1, unless a subsequent commit is created using "git commit
> --series".  And the Patch Set ID will be preserved across
> cherry-picks, rebase operations, and git send-email/git apply-message
> operations.
>
> So if someone replies to an existing e-mail thread with a new commit,
> git format-patch will give it a different patch set ID, so we can
> distinguish it from an amended  copy of a patch in the patch series.
>
> It also means that singleton commits, the patch ID effectively acts
> much like the tranditonal Change-ID.  For multi-commit patch series,
> all of the commits will have the same patch set ID.

Yeah, I like that aspect the best---the case for single commit
series falling out as a natural degenerate case of the more general
case to support multi-commit series is a good sign that the design
got something right ;-)

I am still not sure what to think about the lack of explicit the
evolution history of one patch set that share the same patch set ID.

When we have 10 commits that share the same patch set ID, I can
imagine that we can easily tell 3 are from one iteration, and 3 and
4 among the rest are from another two iterations by noticing that
there are three strand of pearls, having 3, 3, and 4 commits on it.
And we can identify the initial round by noticing that one of the
commits have its name as the patch set ID, but I am not sure if we
should be OK by not having anything but the committter timestamp to
tell which one among the other two iterations are earlier, and we
cannot tell anything about these two other iterations if they are
independent rewrites of the original round.

But other than that, I like something with clearly defined semantics
(and the definition coming naturally out of the structure, not out
of some arbitrary convention that forces to bring in some
semantics), and what you outlined above looks reasonably clean and
easy to use.

Thanks.