On Fri, Jun 20, 2025 at 08:43:07PM +0000, brian m. carlson wrote: > On 2025-06-20 at 01:56:02, Junio C Hamano wrote: > > "brian m. carlson" <sandals@xxxxxxxxxxxxxxxxxxxx> writes: > > > > > We have a a variety of uses of GIT_HASH_SHA1 littered throughout our > > > code. Some of these really mean to represent specifically SHA-1, but > > > some actually represent the original hash algorithm used in Git which is > > > implied by older formats and protocols which do not contain hash > > > information. For instance, the bundle v1 and v2 formats do not contain > > > hash algorithm information, and thus SHA-1 is implied by the use of > > > these formats. > > > > Does that mean use of _ORIGINAL is a sign that these places should > > keep using SHA-1 and should not change? > > Yes. I think this makes sense. There have been a bunch of locations in our code base where I was left wondering whether the use of SHA1 is intentional or not. Making these explicit should make it a lot more obvious into which of these categories a callsite falls into. [snip] > > > Add a constant for documentary purposes which indicates this value. It > > > will always be the same as SHA-1, since this is an essential part of > > > these formats, but its use indicates this particular reason and not any > > > other reason why SHA-1 might be used. > > > > I am not sure what this means. If we use GIT_HASH_SHA1 in such > > places explicitly (as opposed to GIT_HASH_DEFAULT), isn't it a sign > > enough that with different versions of Git, that particular code > > path should keep using SHA-1 no matter what the default is? > > If we have a test helper that computes hashes and someone specified > "sha1" on the command line, that's GIT_HASH_SHA1. Someone said, "I'd > like to use SHA-1." Similarly, in the reftable code, we can read the > byte value indicating that the reftable is in SHA-1 and that's an > explicit decision. Tiny nit: even for the reftable format it is not always clear whether it is GIT_HASH_SHA1 or GIT_HASH_ORIGINAL. There are two versions of the format: - The first version implicitly uses SHA1, so this would be GIT_HASH_ORIGINAL. - The second version specifies the hash format, so it would be either GIT_HASH_SHA1 or GIT_HASH_SHA256. But again, I think that this distinction is actually useful. > If we default to SHA-1 because nobody specified extensions.objectformat, > then that's GIT_HASH_ORIGINAL. Nobody made a decision or opted into an > algorithm; we just didn't think hard enough about cryptographic agility > in the original Git and we assumed SHA-1. > > They're both the same numeric constant here and always will be (even if, > in a future version of Git, we get rid of SHA-1 altogether and we > otherwise die on that code). But there's a difference in intention: one > explicitly stated SHA-1 as opposed to a different algorithm and one just > got a default because that's the compatible legacy behaviour. Yup. Patrick