On Thu, Jun 26, 2025 at 9:12 PM Elijah Newren <newren@xxxxxxxxx> wrote: > > On Fri, Jun 20, 2025 at 9:12 AM Christian Couder > <christian.couder@xxxxxxxxx> wrote: > > In https://lore.kernel.org/git/aD4i7YhUnT5Kgew-@xxxxxxxxxxxxxxxxxxxxxxxxxxxx/ > > brian said: > > > > "Actually, what I was saying is that we should have one for the hash > > algorithm that is used in the Git object. I don't care about the hash > > algorithm used in OpenPGP, X.509, or OpenSSH (that is, whether it's > > signed with SHA-512 or SHA-256), but we can have multiple signatures in > > a single commit such that there's both a SHA-1 signature and a SHA-256 > > signature." > > > > so I implemented the possibility that there's both a SHA-1 signature > > and a SHA-256 signature in a single commit. > > > > If you disagreed with what brian suggested, it would have been nice to > > reply to brian then. > > Sorry about that. I hadn't read the whole thread; further, I was > previously trying to avoid the details of signatures beyond the basics > and hoping to leave those details to others, but between v2 and some > words you had in this version plus deciding to just take a closer look > while on vacation (as I was last week), I decided to try to look a bit > closer. Thanks for taking a closer look. > Turns out, though, that despite my attempts to read a bit more > documentation and code in the project, I still had a fundamental > misunderstanding last week when I was responding. I had been assuming > that the gpgsig headers would appear on sha1-commits and gpgsig-sha256 > headers would appear on sha256 commits. I didn't understand that both > types of headers can appear on both types of commits (so that, for > example, gpgsig-sha256 would be use the sha256-commit as the source > and sign that data but write the result to a sha1-commit). I talked > with brian this morning to ask a few questions and he clarified this > for me. > > (Side note: It might be nice to clarify this in > Documentation/technical/hash-function-transition.adoc; that document > did not dispel this confusion when I read it last week.) I agree that it would help to improve that document. I think it's a separate topic though. > And the part I didn't understand the first time around is that e.g. > the SHA-1 commit can include both the signatures on the SHA-1 object > and on the SHA-256 object. (Similarly, the SHA-256 commit can include > both signatures as well.) > > > > > - if there is more than one signature on the SHA-1 object or on > > > > the SHA-256 object, a warning is emitted, > > This is now ambiguous to me. more than one signature using the SHA-1 > as source for signing, or more than one signature recorded within the > SHA-1 commit? I think you mean the former. Yes, it's the former. I am not sure there is a simple and short way to disambiguate these meanings. > > My opinion on this is that in cases like this we might not always know > > what could be useful for tools or users in general, so it might be > > better to provide more information that can be easily discarded if not > > useful rather than not enough information. > > > > brian seemed to say that <git-hash-algo> and <signature-format> are > > important so I just prefered to have both, especially as they are not > > costly to get. > > So, with my new understanding, <git-hash-algo> is necessary because if > you only had the gpg signature on a commit object, you don't know what > it was a signature of -- it might be a signature of that commit > object, or it might be a signature of it's "compatibility" object > created with a different git hashing algorithm. I agree that it's important for this reason. > So, fast-import -- > regardless of whether it is writing into a sha1 repo or a sha256 repo > -- won't be able to know how to check the validity of the gpg > signature unless it knows whether to check the signature of the sha1 > commit or the sha256 commit. And if the repository that fast-import> is writing to isn't writing compatibility objects for interoperation > at all, then it won't be able to check any compatibility signatures at > all; it'll only be able to check signatures of the actual object it > wants to write. Yeah, right. > > > Is the <signature-format> merely self-inflicted pain from stripping > > > the ascii armor lines? If so, would it make more sense to just > > > include those armor lines as-is in the fast-export stream and let > > > fast-import process it? > > > > The ascii armor lines are kept in the fast-export stream, but > > fast-export's ouput is supposed to be processed somehow sometimes > > before being fed back to fast-import, so I think we should make it > > easy for tools or people processing the output to get signature > > information. > > Ah, just for convenience of the importer/processor; fair enough. Yeah, right. > > > Is the <git-hash-algo> due to the fact that we have separate `gpgsig` > > > and `gpgsig-sha256` commit headers and we want to use that information > > > to avoid writing these headers to the wrong-sized objects (and/or to > > > avoid checking whether the signature is valid on the wrong-sized > > > objects)? > > > > Yeah, I think it could help with that, but brian might better answer that. > > I think the answer is actually no, this isn't at all what these are > about. These are about knowing how to check the validity of the > signatures, because the signatures aren't necessarily associated with > the object they were read from; they could be a signature of its > compatibility object instead. First I am now not sure what "wrong-sized objects" meant in your previous sentence. I thought it meant "object using SHA-1 vs object using SHA-256". In that case, the <git-hash-algo> could indeed help check the signature on the right object. Also git fast-import has to recreate the `gpgsig` and `gpgsig-sha256` commit headers when importing signatures, so <git-hash-algo> can help it with that too. > > > If so, could that be spelled out in the docs as well, > > > especially since it appears that the intent of these headers is left > > > unimplemented due to not changing fast-import to do anything with > > > them? > > > > fast-import puts back the `gpgsig` or `gpgsig-sha256` headers > > depending on <git-hash-algo>, so it's useful at least for that. I will > > improve the docs about it. In v5, I have improved the fast-import documentation with the following: * `<git-hash-algo>` specifies which Git object format this signature applies to, either `sha1` or `sha256`. This allows to know which representation of the commit was signed (the SHA-1 or the SHA-256 version) which helps with both signature verification and interoperability between repos with different hash functions. * `<signature-format>` specifies the type of signature, such as `openpgp`, `x509`, `ssh`, or `unknown`. This is a convenience for tools that process the stream, so they don't have to parse the ASCII armor to identify the signature type. The commit message is improved too. > > > And if <git-hash-algo>'s purpose is to ensure they are only used when > > > writing same-sized object as what was exported, then...isn't that a > > > bug? > > > > I am not sure I understand what would be the bug. > > It's not relevant given my misunderstanding, but to explain what I was > thinking last week (note that the first two bullets are my mistake): > > * gpgsig headers only occur on sha1 commits and are the signatures > of the sha1 commit sans the gpgsig header > * gpgsig-sha256 headers are similar with respect to sha256 commits > * we were using <git-hash-algo> as an optimization to only sign > commits in fast-import if the repository we were writing to was using > the same hash function as the repo we were writing from > > In such a case, that'd be a bug for anyone that wanted a > 'resign-commits-that-were-previously-signed-if-the-signature-becomes-invalid'. > Although a signature of a sha256 commit clearly won't verify if we > exported a sha256 repo and imported it as a sha1 repo and tried to > verify that signature using a sha1 commit, that doesn't mean we should > just ignore the fact that the sha256 commit had a signature. The user > clearly expressed the intent to resign any commits that had a > signature before but where the signature is no longer valid. I see. > > > This series was started because people wanted to be able to do > > > things like keeping signature even when they are no longer valid or > > > resigning commits that have a no longer commit signature (among other > > > uses), but that would mean that if someone exports a sha1 repository > > > and imports it as sha256, we don't want to ignore the fact that the > > > sha1 commit was signed for those usecases. > > > > > > If, however, the <git-hash-algo>'s purpose is merely as a performance > > > optimization that fast-import can employ in the cases where it checks > > > for signatures being valid, so it can avoid checking when it know the > > > hash size isn't even the same, then it could make sense. > > > > I think it could be used for that, but I'd like brian's opinion about this. > > No, I was just wrong; it can't be used as such a performance > optimization. A signature of a sha256 commit stored in a sha1 commit > (or a signature of a sha1 commit stored in a sha256 commit) isn't > accidental, it's part of the intentional design of the hash-function > transition. > > But, it might mean that the rules for what to do with signatures in > fast-import need to be more complex than what I previously > wrote...which I'll comment on a bit below. Ok. > > > Wait..does this mean fast-export is obligated to walk over both all > > > sha1 commits and all "equivalent" sha256 commits when exporting a > > > repo? I thought most operations on the repo would walk over only one > > > or the other; walking over both seems to be against the spirit of the > > > "fast" in "fast-export". Am I missing something? (Possibly related > > > question: Does "git log" bother walking over both, or does it only > > > walk over one?) Even if this really is wanted by some users, > > > shouldn't they manually request it rather than making exports slow for > > > everyone else by default? > > > > No fast-export doesn't walk over both the sha1 commits and all > > "equivalent" sha256 commits, it can just process at most 2 signatures > > for a given commit, one with the `gpgsig` header and one with the > > `gpgsig-sha256` header. I will try to reword the doc to make it > > clearer. In v5 all the signatures are exported, so the doc is now: "While all the signatures of a commit are exported, an importer may choose to accept only some of them. For example linkgit:git-fast-import[1] currently stores at most one signature per Git hash algorithm in each commit." I hope it avoids misunderstandings. > Right, but fast-import might need to *write* both sha1 commits and > sha256 commits (or at least write one of them and keep a mapping > between the two in memory) in order to correctly verify and process > both gpgsig headers...or at least, it will be unable to verify > signatures of compatibility objects if people do not have the > appropriate extensions.compatObjectFormat config setting set within > the repository they are importing to. Yeah, it seems to me that when extensions.compatObjectFormat is set, then a compatibility mapping between objects can be used. I don't think using it should be part of this patch though. When I will work on making it possible for fast-import to check signatures, I will try to use it to write both sha1 commits and sha256 commits, and check both signatures. > This actually expands the usecases I mentioned previously a bit. > Those usecases were: > > (A) Make fast-export include signatures, and make fast-import include > them unconditionally (even if invalid) > (B) Similar to (A), but make *fast-import* check them and either error > out or drop them if they become invalid > (C) Simliar to (B), but make *fast-import* re-sign the commit if they > become invalid > (D) Similar to (A), but make *fast-import* re-sign the commit even if > the signature would have been valid > > Cases (B) and (C) might either need special care or need to bifurcate > into additional options given that we can have two signatures on a > commit. "Validity of gpg signature stored in the commit" now becomes > "Validity of gpg signatureS stored in the commit". Whereas before we > only considered the cases of "no valid signatures" and "all valid > signatures", we also need to worry about the case where one signature > is valid and the other is either known to be invalid or simply cannot > be checked because this repo isn't set up to write compatibility > objects. Yeah, I agree that there might be new cases to consider. I don't think this affects this patch though as long as it allows one signature for each hash to be exported and imported which is the case. > > Yeah, I would have been happy if we could have been aligned with the > > goals of the format and the fields earlier, but better late than > > never. > > I think the proposal for the fields in this version make sense now, > but it might have been easier to get alignment if (1) we could get > real testcases of the multiple signature case, There is such a multiple signature test case in v5. > and (2) we could see > the actual code for fast-import to deal with the signatures (trying to > guess the needs of the importer and the meanings of the fields we > export is supposed to be is harder when their usage has been left > unimplemented). That said, I also understand you wanting to avoid > implementing too much and throwing things away if we disagreed on > early decisions. Yeah, I prefer to move forward step by step on this. > Anyway, I'm on board with these new fields and their purpose, though > I'm still curious if we start diverging when we run across surprises > as we dig further into the implementation. As we have clearly warned that the current implementation is experimental, I think we should have more flexibility to adapt the formats and behaviors if we run across surprises in the next steps. Thanks for your thoughts and reviews.