Re: Semantics of change IDs (Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer)

"Remo Senekowitsch" <remo@xxxxxxxxxxx> · Wed, 16 Apr 2025 00:30:22 +0200

On Mon Apr 14, 2025 at 5:13 PM CEST, Junio C Hamano wrote:
> "Theodore Ts'o" <tytso@xxxxxxx> writes:
>
>> On Fri, Apr 11, 2025 at 10:44:43AM -0700, Junio C Hamano wrote:
>>> 
>>> The submitting contributor must make a conscious arrangement to give
>>> a "patch set ID" shared among the messages in a single iteration,
>>> and everybody who are responding must make sure they do not add the
>>> same ID to the messages they throw at the thread in response.  Those
>>> who use format-patch and send-email can do that with convention and
>>> automation and there is no reason to rely on In-Reply-To: header
>>> (which may confuse the automated recipient of manually created
>>> follow-up messages).
>>
>> So it all depends on how the patch set ID is implemented.  Here's one
>> way that I had in mind.  The reason why I like like this over the
>> Change-ID approach is that the semantics can be very clearly defined,
>> and the only thing we rely on is the user saying "this new commit is
>> part of patch series which I'm putting together". 
>>
>> By default when creating a new commit, the field is empty (in which
>> case the patch set ID is presumed to be the same as the commit ID), or
>> if the user gives a command-line flag say, "git commit --series"
>> which indicates that it is part of a patch series in which case the
>> patch set ID of the commit is set to the patch set ID of the current
>> commit (i.e., eventully, its parent commit).
>>
>> Whenever the commit is amended or rebased or cherry picked, if the
>> patch series ID is NULL, then it is set to the original commit ID.
>> Otherwise, the existing patch set ID is preserved.
>>
>> The patch set ID will be output by git format-patch (perhaps as "Patch
>> Series ID: sha has" immediately after the --- line.  And if it is
>> present, "git am" will import that patch series ID into git commit
>> which creates when it sucks in the e-mail.
>>
>> The net affect of this is that for new versions of git which implement
>> the Patch Set ID, all new commits are treated as patch series of
>> length 1, unless a subsequent commit is created using "git commit
>> --series".  And the Patch Set ID will be preserved across
>> cherry-picks, rebase operations, and git send-email/git apply-message
>> operations.
>>
>> So if someone replies to an existing e-mail thread with a new commit,
>> git format-patch will give it a different patch set ID, so we can
>> distinguish it from an amended  copy of a patch in the patch series.
>>
>> It also means that singleton commits, the patch ID effectively acts
>> much like the tranditonal Change-ID.  For multi-commit patch series,
>> all of the commits will have the same patch set ID.
>
> Yeah, I like that aspect the best---the case for single commit
> series falling out as a natural degenerate case of the more general
> case to support multi-commit series is a good sign that the design
> got something right ;-)
>
> I am still not sure what to think about the lack of explicit the
> evolution history of one patch set that share the same patch set ID.
>
> When we have 10 commits that share the same patch set ID, I can
> imagine that we can easily tell 3 are from one iteration, and 3 and
> 4 among the rest are from another two iterations by noticing that
> there are three strand of pearls, having 3, 3, and 4 commits on it.
> And we can identify the initial round by noticing that one of the
> commits have its name as the patch set ID, but I am not sure if we
> should be OK by not having anything but the committter timestamp to
> tell which one among the other two iterations are earlier, and we
> cannot tell anything about these two other iterations if they are
> independent rewrites of the original round.
>
> But other than that, I like something with clearly defined semantics
> (and the definition coming naturally out of the structure, not out
> of some arbitrary convention that forces to bring in some
> semantics), and what you outlined above looks reasonably clean and
> easy to use.

Doesn't a patch set ID suffer from the same kind of ambiguity the
change-id supposedly does? Patch sets can be split and merged, a commit
from one patch set can be cherry-picked into another. What patch set ID
should such a cherry-picked commit have?

And I think the argument that a change-id for a singleton patch
set naturally falls out of the patch set ID can easily be reversed.
Admittedly, I don't have the most experience with the mailing list
workflow, but a multi-commit patch set usually comes with a cover
letter, right? And people like to track their cover letter in a commit?
IIUC, b4 is designed around that too.

In that case, the cover letter has its own change-id as any other
commit, which will naturally remain stable across every version of the
patch set. It would be non-sensical to squash, split or cherry-pick the
cover letter commit. Sounds like a great candidate for the patch set ID.

So the patch set ID can just as naturally flow out from the change-id.

I can see two concrete disadvantages of the patch set ID:

* It's strictly less powerful. As explained, the change-id can do
  everything the patch set ID can via the cover letter. But the patch
  set ID cannot help you track how individual commits within the patch
  set evolved.

* It's more complicated. While many Git users work with patch sets every
  day, it's not a concept in Git iself. Git only knows about commits.
  The patch set ID would introduce a new concept into Git unnecessarily,
  while the change-id naturally extends the language Git already speaks,
  that of commits.

Remo