On 2025-07-04 at 11:18:12, Aditya Garg wrote: > Hi all > > I just read that git aims to transition to SHA256 by default, and conversion from SHA1 to SHA256 is needed for old > repos. I was just curious how will that be achieved. > > Dumb idea, but maybe we can just encode the existing SHA1 sums' string to SHA256? > > Eg: > > $ echo -n 8994f255af5451b6cd1db01ee16d8cf15b9df81e | sha256sum > bf8d6d915848377db81ee47e883c0a683b3d86a49ab120191ea1c3d76a30c33f *- > > so bf8d6d915848377db81ee47e883c0a683b3d86a49ab120191ea1c3d76a30c33f will be our new commit hash. This would unfortunately still be vulnerable to collisions in SHA-1, which is the problem we're trying to avoid. For instance, if I can create two blobs with that SHA-1 hash, then I can also create two blobs with the corresponding SHA-256 value, since the input in this case is just the SHA-1 value. The way we do the transition is pretty simple. Blobs don't change; we just hash them with either SHA-1 or SHA-256. For trees, we re-write all of the entries to use the SHA-256 object IDs instead of the SHA-1 object IDs and then we hash the result with SHA-256. And for commits and tags, the headers that represent objects (tree, parent, and object) are converted in a similar manner and then, again, hashed with SHA-256. You can actually see how the conversion operates in `object-file-convert.c`. `repo_oid_to_algop` converts an object from one format to another based on the loose object map outlined in `Documentation/technical/hash-function-transition.adoc`, or the v3 pack index functionality which is not yet upstream but is available in my `sha256-interop` branch. In general, the hash function transition document explains a lot of the decision behind why we're doing what we're doing and how it works. I have to give credit to Jonathan Nieder for writing the document and to many people on the list for helping to contribute to it, and I encourage you to read it: it's not too complex. So with this approach, the SHA-256 object ID is computed totally independently of the SHA-1 object ID but in the exact same way, just with SHA-256 object IDs inside. We already have support for SHA-256-only repositories right now: you can do `git init --object-format=sha256` and create one, although not all forges and tools currently support them. The process of the conversion when we're in interoperability mode means that we can take a repository that's in SHA-1, convert it to SHA-256, continue to interoperate with the old SHA-1 version if we like, and then, when we no longer want to use SHA-1, simply stick with the SHA-256 version and avoid using SHA-1 at all. That's part of what I'm working on right now, and I'm pleased to report that I'm making a good amount of progress. If you're able to attend Git Merge this year, either in person or remotely, I'll be giving a talk on this topic. I'm also planning to open a discussion on the list within the next couple days or weeks about some protocol extensions that will be necessary to let us fetch, clone, and push all repositories in interoperability mode, so please feel free to follow along for that. -- brian m. carlson (they/them) Toronto, Ontario, CA
Attachment:
signature.asc
Description: PGP signature