Phillip Wood <phillip.wood123@xxxxxxxxx> writes: > Thanks for sharing that, it is an interesting list. On the subject of > encoding I do think our documentation could be clearer that the > encoding applies to all the headers as well as the commit message. As > far as I can see it only mentions the commit message, not the author > or committer identities but repo_logmsg_reencode() re-encodes the > whole commit buffer. Out of interest do you think we could be doing a > better job with fsck to pick up some of these problems earlier? > > I think "git rebase" only cares that the author identity can be parsed > by split_ident() which is fairly lenient. "rebase" already knows that it has to be picky which header fields need to be propagated and which must not be, doesn't it? Can the same be said for arbitrary "extra" header fields? Information on some of the header fields are inherently destroyed when you refine an existing commit. The value on the 'parent' headers may need to be updated (unless "rebase" is fast-forwarding an earlier part of the changes on the same base), the 'author' information usually wants to be preserved, but when the scale of the change since the previous iteration is so large, you may give it a new authorship, the 'committer' information should record who created the new commit object that records the result of rebasing, the 'gpgsig' and 'gigsig-sha256' header fields would lose validity if you are creating a new object that is different from the original by even a single bit (if we are somehow recording which predecessor commit the new one replaces, it certainly is safe to drop these that have lost validity, as we can go back to the predecessor to see it has a valid signature, and the change in the new commit that lost the signature fields is the moral equivalent of the original. Otherwise, carrying a stale signature may serve as a reminder that the commit was rewritten in the past---I dunno). And so on. Now, one thing that worries me is this. If "extra" commit headers include truly extra fields with unknown semantics, the machinery cannot tell which ones are safe and benefitial to propagate. Thanks.