Re: Semantics of change IDs (Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer)

Jacob Keller <jacob.e.keller@xxxxxxxxx> · Tue, 15 Apr 2025 14:38:54 -0700

On 4/12/2025 4:13 PM, Theodore Ts'o wrote:
> On Fri, Apr 11, 2025 at 10:44:43AM -0700, Junio C Hamano wrote:
>>
>> The submitting contributor must make a conscious arrangement to give
>> a "patch set ID" shared among the messages in a single iteration,
>> and everybody who are responding must make sure they do not add the
>> same ID to the messages they throw at the thread in response.  Those
>> who use format-patch and send-email can do that with convention and
>> automation and there is no reason to rely on In-Reply-To: header
>> (which may confuse the automated recipient of manually created
>> follow-up messages).
> 
> So it all depends on how the patch set ID is implemented.  Here's one
> way that I had in mind.  The reason why I like like this over the
> Change-ID approach is that the semantics can be very clearly defined,
> and the only thing we rely on is the user saying "this new commit is
> part of patch series which I'm putting together". 
> 

I've been catching up on this thread, trying to get a sense of the
discussion. I like the notion of this patch-id. I think dealing with
patch series as a single entity with one patch-id is nice.

In my experiences with gerrit, a patch series being treated as
individual reviews with their own  change-ids usually discouraged doing
things in series and especially discouraged splitting a patch into two
after a review started. I prefer being able to collate the series
together, so a patch-id is useful.

Having the singleton change-id semantics naturally emerge is nice.

One thing I really liked from previous in the thread was the "reverse
hex" where they suggested encoding a change-id uses letters from the end
of the alphabet. I really like that it was immediately unambiguous when
you see a change-id value vs seeing a commit-id. Obviously there are
lots of other ways to encode this and I think the thread has discussed
numerous options.

> By default when creating a new commit, the field is empty (in which
> case the patch set ID is presumed to be the same as the commit ID), or
> if the user gives a command-line flag say, "git commit --series"
> which indicates that it is part of a patch series in which case the
> patch set ID of the commit is set to the patch set ID of the current
> commit (i.e., eventully, its parent commit).
> 

> Whenever the commit is amended or rebased or cherry picked, if the
> patch series ID is NULL, then it is set to the original commit ID.
> Otherwise, the existing patch set ID is preserved.

Ok, so it starts null, but as soon as you rebase/amend/etc the commit
then its set. This is how the change-id semantics fall out for singleton
commits.

> 
> The patch set ID will be output by git format-patch (perhaps as "Patch
> Series ID: sha has" immediately after the --- line.  And if it is
> present, "git am" will import that patch series ID into git commit
> which creates when it sucks in the e-mail.
> 
> The net affect of this is that for new versions of git which implement
> the Patch Set ID, all new commits are treated as patch series of
> length 1, unless a subsequent commit is created using "git commit
> --series".  And the Patch Set ID will be preserved across
> cherry-picks, rebase operations, and git send-email/git apply-message
> operations.

I think its likely for tooling to emerge to retroactively convert a
branch into a "series" with the same patch-id after the fact once a
series is formed.

> 
> So if someone replies to an existing e-mail thread with a new commit,
> git format-patch will give it a different patch set ID, so we can
> distinguish it from an amended  copy of a patch in the patch series.
> 
> It also means that singleton commits, the patch ID effectively acts
> much like the tranditonal Change-ID.  For multi-commit patch series,
> all of the commits will have the same patch set ID.
> 
>        	   	   	     	      - Ted
>