On 2025-06-20 at 01:56:02, Junio C Hamano wrote: > "brian m. carlson" <sandals@xxxxxxxxxxxxxxxxxxxx> writes: > > > We have a a variety of uses of GIT_HASH_SHA1 littered throughout our > > code. Some of these really mean to represent specifically SHA-1, but > > some actually represent the original hash algorithm used in Git which is > > implied by older formats and protocols which do not contain hash > > information. For instance, the bundle v1 and v2 formats do not contain > > hash algorithm information, and thus SHA-1 is implied by the use of > > these formats. > > Does that mean use of _ORIGINAL is a sign that these places should > keep using SHA-1 and should not change? Yes. > I am having a hard time guessing/assessing the value of having _ORIGINAL > that is a synonym for _SHA1; with redirection, it pretends as if the > underlying value can be updated from SHA-1 to SHA-256 (and that is > the very intention behind GIT_HASH_DEFAULT symbol that gives us a > level of indirection), but it is hard to imagine we would ever want > to change what _ORIGINAL means, as that word talks about a historical > fact that will never change over time. I agree. _ORIGINAL indicates that this is a use of SHA-1 which is a historical fact and is a legacy decision as opposed to one specified explicitly. For instance, if we're setting the algorithm for bundle v1 and v2, then we'd use _ORIGINAL because those formats did not specify a hash value when they were designed and, for legacy reasons, we cannot change that fact. However, if with bundle v3, a user specified @object-format=sha1, then we'd use _SHA1, since that was an explicit decision documented. Similarly, _SHA1 represents extensions.objectFormat=sha1, which is an intentional decision to use the older algorithm. > > Add a constant for documentary purposes which indicates this value. It > > will always be the same as SHA-1, since this is an essential part of > > these formats, but its use indicates this particular reason and not any > > other reason why SHA-1 might be used. > > I am not sure what this means. If we use GIT_HASH_SHA1 in such > places explicitly (as opposed to GIT_HASH_DEFAULT), isn't it a sign > enough that with different versions of Git, that particular code > path should keep using SHA-1 no matter what the default is? If we have a test helper that computes hashes and someone specified "sha1" on the command line, that's GIT_HASH_SHA1. Someone said, "I'd like to use SHA-1." Similarly, in the reftable code, we can read the byte value indicating that the reftable is in SHA-1 and that's an explicit decision. If we default to SHA-1 because nobody specified extensions.objectformat, then that's GIT_HASH_ORIGINAL. Nobody made a decision or opted into an algorithm; we just didn't think hard enough about cryptographic agility in the original Git and we assumed SHA-1. They're both the same numeric constant here and always will be (even if, in a future version of Git, we get rid of SHA-1 altogether and we otherwise die on that code). But there's a difference in intention: one explicitly stated SHA-1 as opposed to a different algorithm and one just got a default because that's the compatible legacy behaviour. -- brian m. carlson (they/them) Toronto, Ontario, CA
Attachment:
signature.asc
Description: PGP signature