Re: [PATCH 02/10] hash: add a constant for the original hash algorithm

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2025-06-20 at 01:56:02, Junio C Hamano wrote:
> "brian m. carlson" <sandals@xxxxxxxxxxxxxxxxxxxx> writes:
> 
> > We have a a variety of uses of GIT_HASH_SHA1 littered throughout our
> > code.  Some of these really mean to represent specifically SHA-1, but
> > some actually represent the original hash algorithm used in Git which is
> > implied by older formats and protocols which do not contain hash
> > information.  For instance, the bundle v1 and v2 formats do not contain
> > hash algorithm information, and thus SHA-1 is implied by the use of
> > these formats.
> 
> Does that mean use of _ORIGINAL is a sign that these places should
> keep using SHA-1 and should not change?

Yes.

> I am having a hard time guessing/assessing the value of having _ORIGINAL
> that is a synonym for _SHA1; with redirection, it pretends as if the
> underlying value can be updated from SHA-1 to SHA-256 (and that is
> the very intention behind GIT_HASH_DEFAULT symbol that gives us a
> level of indirection), but it is hard to imagine we would ever want
> to change what _ORIGINAL means, as that word talks about a historical
> fact that will never change over time.

I agree.  _ORIGINAL indicates that this is a use of SHA-1 which is a
historical fact and is a legacy decision as opposed to one specified
explicitly.

For instance, if we're setting the algorithm for bundle v1 and v2, then
we'd use _ORIGINAL because those formats did not specify a hash value
when they were designed and, for legacy reasons, we cannot change that
fact.  However, if with bundle v3, a user specified @object-format=sha1,
then we'd use _SHA1, since that was an explicit decision documented.
Similarly, _SHA1 represents extensions.objectFormat=sha1, which is an
intentional decision to use the older algorithm.

> > Add a constant for documentary purposes which indicates this value.  It
> > will always be the same as SHA-1, since this is an essential part of
> > these formats, but its use indicates this particular reason and not any
> > other reason why SHA-1 might be used.
> 
> I am not sure what this means.  If we use GIT_HASH_SHA1 in such
> places explicitly (as opposed to GIT_HASH_DEFAULT), isn't it a sign
> enough that with different versions of Git, that particular code
> path should keep using SHA-1 no matter what the default is?

If we have a test helper that computes hashes and someone specified
"sha1" on the command line, that's GIT_HASH_SHA1.  Someone said, "I'd
like to use SHA-1."  Similarly, in the reftable code, we can read the
byte value indicating that the reftable is in SHA-1 and that's an
explicit decision.

If we default to SHA-1 because nobody specified extensions.objectformat,
then that's GIT_HASH_ORIGINAL.  Nobody made a decision or opted into an
algorithm; we just didn't think hard enough about cryptographic agility
in the original Git and we assumed SHA-1.

They're both the same numeric constant here and always will be (even if,
in a future version of Git, we get rid of SHA-1 altogether and we
otherwise die on that code).  But there's a difference in intention: one
explicitly stated SHA-1 as opposed to a different algorithm and one just
got a default because that's the compatible legacy behaviour.
-- 
brian m. carlson (they/them)
Toronto, Ontario, CA

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux