Re: [PATCH v4] fast-(import|export): improve on commit signature output format

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jun 19, 2025 at 6:36 AM Christian Couder
<christian.couder@xxxxxxxxx> wrote:
>
> A recent commit, d9cb0e6ff8 (fast-export, fast-import: add support for
> signed-commits, 2025-03-10), added support for signed commits to
> fast-export and fast-import.
>
> When a signed commit is processed, fast-export can output either
> "gpgsig sha1" or "gpgsig sha256" depending on whether the signed
> commit uses the SHA-1 or SHA-256 Git object format.
>
> However, this implementation has a number of limitations:
>
>   - the output format was not properly described in the documentation,

Thanks for working on fixing this.

>   - the output format is not very informative as it doesn't even say
>     if the signature is an OpenPGP, an SSH, or an X509 signature,

Why would it need to say what type of signature it is?  Don't the
ascii armor lines have e.g. "----BEGIN PGP SIGNATURE----" and "----END
PGP SIGNATURE----" around it, which fast-import can read as well as
fast-export?  Is the idea that we strip those lines and now need to
replace the information we lost?

>   - the implementation doesn't support having both one signature on
>     the SHA-1 object and one on the SHA-256 object.

Not sure I understand this; more questions around this later.

> Let's improve on these limitations by improving fast-export and
> fast-import so that:
>
>   - both one signature on the SHA-1 object and one on the SHA-256
>     object can be exported and imported,
>   - if there is more than one signature on the SHA-1 object or on
>     the SHA-256 object, a warning is emitted,
>   - the output format is "gpgsig <git-hash-algo> <signature-format>",
>     where <git-hash-algo> is the Git object format as before, and
>     <signature-format> is the signature type ("openpgp", "x509",
>     "ssh" or "unknown",
>   - the output is properly documented.

Perhaps this was discussed in an earlier round, but if so I either
forgot or missed it.  What value does <git-hash-algo> and
<signature-format> provide?  How are they intended to be used?

Is the <signature-format> merely self-inflicted pain from stripping
the ascii armor lines?  If so, would it make more sense to just
include those armor lines as-is in the fast-export stream and let
fast-import process it?  Then we wouldn't have to introduce all the
outputting and parsing of this new <signature-format> field and worry
about the new special "unknown" status.

Is the <git-hash-algo> due to the fact that we have separate `gpgsig`
and `gpgsig-sha256` commit headers and we want to use that information
to avoid writing these headers to the wrong-sized objects (and/or to
avoid checking whether the signature is valid on the wrong-sized
objects)?  If so, could that be spelled out in the docs as well,
especially since it appears that the intent of these headers is left
unimplemented due to not changing fast-import to do anything with
them?

And if <git-hash-algo>'s purpose is to ensure they are only used when
writing same-sized object as what was exported, then...isn't that a
bug?  This series was started because people wanted to be able to do
things like keeping signature even when they are no longer valid or
resigning commits that have a no longer commit signature (among other
uses), but that would mean that if someone exports a sha1 repository
and imports it as sha256, we don't want to ignore the fact that the
sha1 commit was signed for those usecases.

If, however, the <git-hash-algo>'s purpose is merely as a performance
optimization that fast-import can employ in the cases where it checks
for signatures being valid, so it can avoid checking when it know the
hash size isn't even the same, then it could make sense.

But, short of that performance optimization, it's unclear to me
whether we'd lose anything by simply exporting "gpgsig <contents of
signature header from original object, including the armor lines>",
and dropping the <git-hash-algo> and <signature-format> lines
entirely.  Am I missing something?  (I may well be; I don't know much
about signing stuff beyond the very basics, and don't mess with signed
commits or tags much myself.)

[...]
> There are no tests in this v4 and in v3 with both a SHA-1 and a
> SHA-256 signature on the same commit though, as I am not sure yet how
> to best generate a commit with such signatures. Suggestions welcome!

If no suggestions are forthcoming, it feels odd to implement this with
no tests.  Would it make sense to leave it out until we know how to
test it?  (More questions on this below...)

[...]
> diff --git a/Documentation/git-fast-export.adoc b/Documentation/git-fast-export.adoc
> index 43bbb4f63c..64198f2186 100644
> --- a/Documentation/git-fast-export.adoc
> +++ b/Documentation/git-fast-export.adoc
> @@ -50,6 +50,23 @@ resulting tag will have an invalid signature.
>         is the same as how earlier versions of this command without
>         this option behaved.
>  +
> +When exported, a signature starts with:
> ++
> +gpgsig <git-hash-algo> <signature-format>
> ++
> +where <git-hash-algo> is the Git object hash so either "sha1" or
> +"sha256", and <signature-format> is the signature type, so "openpgp",
> +"x509", "ssh" or "unknown".
> ++
> +For example, an OpenPGP signature on a SHA-1 commit starts with
> +`gpgsig sha1 openpgp`, while an SSH signature on a SHA-256 commit
> +starts with `gpgsig sha256 ssh`.

I had a number of comments/questions on this above already.

> ++
> +Currently for a given commit, at most one signature for the SHA-1
> +object and one signature for the SHA-256 object are exported, each
> +with their respective <git-hash-algo> identifier.

Wait..does this mean fast-export is obligated to walk over both all
sha1 commits and all "equivalent" sha256 commits when exporting a
repo?  I thought most operations on the repo would walk over only one
or the other; walking over both seems to be against the spirit of the
"fast" in "fast-export".  Am I missing something?  (Possibly related
question: Does "git log" bother walking over both, or does it only
walk over one?)  Even if this really is wanted by some users,
shouldn't they manually request it rather than making exports slow for
everyone else by default?

> +A warning is
> +emitted for each additional signature found.

Why?  This seems odd to me.  Why not merely export them all, and let
fast-import throw warnings or errors if it sees more than one and is
not yet prepared to handle multiple signatures?

> ++
>  NOTE: This is highly experimental and the format of the data stream may
>  change in the future without compatibility guarantees.
>
> diff --git a/Documentation/git-fast-import.adoc b/Documentation/git-fast-import.adoc
> index 250d866652..db5e5c8da5 100644
> --- a/Documentation/git-fast-import.adoc
> +++ b/Documentation/git-fast-import.adoc
> @@ -445,7 +445,7 @@ one).
>         original-oid?
>         ('author' (SP <name>)? SP LT <email> GT SP <when> LF)?
>         'committer' (SP <name>)? SP LT <email> GT SP <when> LF
> -       ('gpgsig' SP <alg> LF data)?
> +       ('gpgsig' SP <algo> SP <format> LF data)?
>         ('encoding' SP <encoding> LF)?
>         data
>         ('from' SP <commit-ish> LF)?
> @@ -518,13 +518,32 @@ their syntax.
>  ^^^^^^^^
>
>  The optional `gpgsig` command is used to include a PGP/GPG signature
> -that signs the commit data.
> +or other cryptographic signature that signs the commit data.

Good catch.

>
> -Here <alg> specifies which hashing algorithm is used for this
> -signature, either `sha1` or `sha256`.
> +....
> +       'gpgsig' SP <git-hash-algo> SP <signature-format> LF
> +       data
> +....
> +
> +The `gpgsig` command takes two arguments:
> +
> +* `<git-hash-algo>` specifies which Git object format this signature
> +  applies to, either `sha1` or `sha256`.
> +
> +* `<signature-format>` specifies the type of signature, such as
> +  `openpgp`, `x509`, `ssh`, or `unknown`.
> +
> +A commit may have at most one signature for the SHA-1 object format
> +(stored in the "gpgsig" header) and one for the SHA-256 object format
> +(stored in the "gpgsig-sha256" header).

Why?  Does this mean hg-fast-export or hg-fast-import (or a
jj-fast-export or jj-fast-import) wouldn't be allowed to specify
multiple signatures?  The fast-export and fast-import streams are
often used for interoperation with other VCSes, but as far as I can
tell, you're encoding a restriction on what's allowed that isn't an
actual problem for git but just a not-yet-implemented state.  If I've
understood correctly that the restriction is merely due to the current
implementation, perhaps the wording could be changed to not list it as
an encoding restriction, but as a current fast-import limitation?

Also, I'm slightly uncomfortable with "the SHA-1 object format" and
"the SHA-256 object format" because of the fact that these tools are
used for interoperability with similarly-named tools from other VCSes.
I think it'd be better to just treat them equally as "here was/were
some signature(s) on the object in the original repo; importers can
choose what to do with them".

> +
> +See below for a detailed description of the `data` command which
> +contains the raw signature data.
> +
> +Signatures are not yet checked in the current implementation though.

...but what does happen with those signatures?  Dropped?  Kept as-is?
Can we just spell this out a bit more clearly?  e.g.

"Signatures are not checked in the current implementation; they are
used as-is, which may mean the signatures are invalid in the imported
repository."

> -NOTE: This is highly experimental and the format of the data stream may
> -change in the future without compatibility guarantees.
> +NOTE: This is highly experimental and the format of the `gpgsig`
> +command may change in the future without compatibility guarantees.

Good clarification.


I briefly skimmed the implementation and test files, and didn't see
any problems...but I think it probably makes more sense to get aligned
on the goals of the format and how these fields are meant to be used
before diving into those details closer.

Thanks for working on this topic.





[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux