On Thu, Jun 19, 2025 at 6:36 AM Christian Couder <christian.couder@xxxxxxxxx> wrote: > > A recent commit, d9cb0e6ff8 (fast-export, fast-import: add support for > signed-commits, 2025-03-10), added support for signed commits to > fast-export and fast-import. > > When a signed commit is processed, fast-export can output either > "gpgsig sha1" or "gpgsig sha256" depending on whether the signed > commit uses the SHA-1 or SHA-256 Git object format. > > However, this implementation has a number of limitations: > > - the output format was not properly described in the documentation, Thanks for working on fixing this. > - the output format is not very informative as it doesn't even say > if the signature is an OpenPGP, an SSH, or an X509 signature, Why would it need to say what type of signature it is? Don't the ascii armor lines have e.g. "----BEGIN PGP SIGNATURE----" and "----END PGP SIGNATURE----" around it, which fast-import can read as well as fast-export? Is the idea that we strip those lines and now need to replace the information we lost? > - the implementation doesn't support having both one signature on > the SHA-1 object and one on the SHA-256 object. Not sure I understand this; more questions around this later. > Let's improve on these limitations by improving fast-export and > fast-import so that: > > - both one signature on the SHA-1 object and one on the SHA-256 > object can be exported and imported, > - if there is more than one signature on the SHA-1 object or on > the SHA-256 object, a warning is emitted, > - the output format is "gpgsig <git-hash-algo> <signature-format>", > where <git-hash-algo> is the Git object format as before, and > <signature-format> is the signature type ("openpgp", "x509", > "ssh" or "unknown", > - the output is properly documented. Perhaps this was discussed in an earlier round, but if so I either forgot or missed it. What value does <git-hash-algo> and <signature-format> provide? How are they intended to be used? Is the <signature-format> merely self-inflicted pain from stripping the ascii armor lines? If so, would it make more sense to just include those armor lines as-is in the fast-export stream and let fast-import process it? Then we wouldn't have to introduce all the outputting and parsing of this new <signature-format> field and worry about the new special "unknown" status. Is the <git-hash-algo> due to the fact that we have separate `gpgsig` and `gpgsig-sha256` commit headers and we want to use that information to avoid writing these headers to the wrong-sized objects (and/or to avoid checking whether the signature is valid on the wrong-sized objects)? If so, could that be spelled out in the docs as well, especially since it appears that the intent of these headers is left unimplemented due to not changing fast-import to do anything with them? And if <git-hash-algo>'s purpose is to ensure they are only used when writing same-sized object as what was exported, then...isn't that a bug? This series was started because people wanted to be able to do things like keeping signature even when they are no longer valid or resigning commits that have a no longer commit signature (among other uses), but that would mean that if someone exports a sha1 repository and imports it as sha256, we don't want to ignore the fact that the sha1 commit was signed for those usecases. If, however, the <git-hash-algo>'s purpose is merely as a performance optimization that fast-import can employ in the cases where it checks for signatures being valid, so it can avoid checking when it know the hash size isn't even the same, then it could make sense. But, short of that performance optimization, it's unclear to me whether we'd lose anything by simply exporting "gpgsig <contents of signature header from original object, including the armor lines>", and dropping the <git-hash-algo> and <signature-format> lines entirely. Am I missing something? (I may well be; I don't know much about signing stuff beyond the very basics, and don't mess with signed commits or tags much myself.) [...] > There are no tests in this v4 and in v3 with both a SHA-1 and a > SHA-256 signature on the same commit though, as I am not sure yet how > to best generate a commit with such signatures. Suggestions welcome! If no suggestions are forthcoming, it feels odd to implement this with no tests. Would it make sense to leave it out until we know how to test it? (More questions on this below...) [...] > diff --git a/Documentation/git-fast-export.adoc b/Documentation/git-fast-export.adoc > index 43bbb4f63c..64198f2186 100644 > --- a/Documentation/git-fast-export.adoc > +++ b/Documentation/git-fast-export.adoc > @@ -50,6 +50,23 @@ resulting tag will have an invalid signature. > is the same as how earlier versions of this command without > this option behaved. > + > +When exported, a signature starts with: > ++ > +gpgsig <git-hash-algo> <signature-format> > ++ > +where <git-hash-algo> is the Git object hash so either "sha1" or > +"sha256", and <signature-format> is the signature type, so "openpgp", > +"x509", "ssh" or "unknown". > ++ > +For example, an OpenPGP signature on a SHA-1 commit starts with > +`gpgsig sha1 openpgp`, while an SSH signature on a SHA-256 commit > +starts with `gpgsig sha256 ssh`. I had a number of comments/questions on this above already. > ++ > +Currently for a given commit, at most one signature for the SHA-1 > +object and one signature for the SHA-256 object are exported, each > +with their respective <git-hash-algo> identifier. Wait..does this mean fast-export is obligated to walk over both all sha1 commits and all "equivalent" sha256 commits when exporting a repo? I thought most operations on the repo would walk over only one or the other; walking over both seems to be against the spirit of the "fast" in "fast-export". Am I missing something? (Possibly related question: Does "git log" bother walking over both, or does it only walk over one?) Even if this really is wanted by some users, shouldn't they manually request it rather than making exports slow for everyone else by default? > +A warning is > +emitted for each additional signature found. Why? This seems odd to me. Why not merely export them all, and let fast-import throw warnings or errors if it sees more than one and is not yet prepared to handle multiple signatures? > ++ > NOTE: This is highly experimental and the format of the data stream may > change in the future without compatibility guarantees. > > diff --git a/Documentation/git-fast-import.adoc b/Documentation/git-fast-import.adoc > index 250d866652..db5e5c8da5 100644 > --- a/Documentation/git-fast-import.adoc > +++ b/Documentation/git-fast-import.adoc > @@ -445,7 +445,7 @@ one). > original-oid? > ('author' (SP <name>)? SP LT <email> GT SP <when> LF)? > 'committer' (SP <name>)? SP LT <email> GT SP <when> LF > - ('gpgsig' SP <alg> LF data)? > + ('gpgsig' SP <algo> SP <format> LF data)? > ('encoding' SP <encoding> LF)? > data > ('from' SP <commit-ish> LF)? > @@ -518,13 +518,32 @@ their syntax. > ^^^^^^^^ > > The optional `gpgsig` command is used to include a PGP/GPG signature > -that signs the commit data. > +or other cryptographic signature that signs the commit data. Good catch. > > -Here <alg> specifies which hashing algorithm is used for this > -signature, either `sha1` or `sha256`. > +.... > + 'gpgsig' SP <git-hash-algo> SP <signature-format> LF > + data > +.... > + > +The `gpgsig` command takes two arguments: > + > +* `<git-hash-algo>` specifies which Git object format this signature > + applies to, either `sha1` or `sha256`. > + > +* `<signature-format>` specifies the type of signature, such as > + `openpgp`, `x509`, `ssh`, or `unknown`. > + > +A commit may have at most one signature for the SHA-1 object format > +(stored in the "gpgsig" header) and one for the SHA-256 object format > +(stored in the "gpgsig-sha256" header). Why? Does this mean hg-fast-export or hg-fast-import (or a jj-fast-export or jj-fast-import) wouldn't be allowed to specify multiple signatures? The fast-export and fast-import streams are often used for interoperation with other VCSes, but as far as I can tell, you're encoding a restriction on what's allowed that isn't an actual problem for git but just a not-yet-implemented state. If I've understood correctly that the restriction is merely due to the current implementation, perhaps the wording could be changed to not list it as an encoding restriction, but as a current fast-import limitation? Also, I'm slightly uncomfortable with "the SHA-1 object format" and "the SHA-256 object format" because of the fact that these tools are used for interoperability with similarly-named tools from other VCSes. I think it'd be better to just treat them equally as "here was/were some signature(s) on the object in the original repo; importers can choose what to do with them". > + > +See below for a detailed description of the `data` command which > +contains the raw signature data. > + > +Signatures are not yet checked in the current implementation though. ...but what does happen with those signatures? Dropped? Kept as-is? Can we just spell this out a bit more clearly? e.g. "Signatures are not checked in the current implementation; they are used as-is, which may mean the signatures are invalid in the imported repository." > -NOTE: This is highly experimental and the format of the data stream may > -change in the future without compatibility guarantees. > +NOTE: This is highly experimental and the format of the `gpgsig` > +command may change in the future without compatibility guarantees. Good clarification. I briefly skimmed the implementation and test files, and didn't see any problems...but I think it probably makes more sense to get aligned on the goals of the format and how these fields are meant to be used before diving into those details closer. Thanks for working on this topic.