On 5 July 2025 12:57:29 am IST, "brian m. carlson" <sandals@xxxxxxxxxxxxxxxxxxxx> wrote: >On 2025-07-04 at 11:18:12, Aditya Garg wrote: >> Hi all >> >> I just read that git aims to transition to SHA256 by default, and conversion from SHA1 to SHA256 is needed for old >> repos. I was just curious how will that be achieved. >> >> Dumb idea, but maybe we can just encode the existing SHA1 sums' string to SHA256? >> >> Eg: >> >> $ echo -n 8994f255af5451b6cd1db01ee16d8cf15b9df81e | sha256sum >> bf8d6d915848377db81ee47e883c0a683b3d86a49ab120191ea1c3d76a30c33f *- >> >> so bf8d6d915848377db81ee47e883c0a683b3d86a49ab120191ea1c3d76a30c33f will be our new commit hash. > >This would unfortunately still be vulnerable to collisions in SHA-1, >which is the problem we're trying to avoid. For instance, if I can >create two blobs with that SHA-1 hash, then I can also create two blobs >with the corresponding SHA-256 value, since the input in this case is >just the SHA-1 value. > >The way we do the transition is pretty simple. Blobs don't change; we >just hash them with either SHA-1 or SHA-256. For trees, we re-write all >of the entries to use the SHA-256 object IDs instead of the SHA-1 object >IDs and then we hash the result with SHA-256. And for commits and tags, >the headers that represent objects (tree, parent, and object) are >converted in a similar manner and then, again, hashed with SHA-256. > >You can actually see how the conversion operates in >`object-file-convert.c`. `repo_oid_to_algop` converts an object from >one format to another based on the loose object map outlined in >`Documentation/technical/hash-function-transition.adoc`, or the v3 pack >index functionality which is not yet upstream but is available in my >`sha256-interop` branch. In general, the hash function transition >document explains a lot of the decision behind why we're doing what >we're doing and how it works. I have to give credit to Jonathan Nieder >for writing the document and to many people on the list for helping to >contribute to it, and I encourage you to read it: it's not too complex. > I'll have a look >So with this approach, the SHA-256 object ID is computed totally >independently of the SHA-1 object ID but in the exact same way, just >with SHA-256 object IDs inside. We already have support for >SHA-256-only repositories right now: you can do `git init >--object-format=sha256` and create one, although not all forges and >tools currently support them. > >The process of the conversion when we're in interoperability mode means >that we can take a repository that's in SHA-1, convert it to SHA-256, >continue to interoperate with the old SHA-1 version if we like, and >then, when we no longer want to use SHA-1, simply stick with the SHA-256 >version and avoid using SHA-1 at all. That's part of what I'm working >on right now, and I'm pleased to report that I'm making a good amount of >progress. If you're able to attend Git Merge this year, either in >person or remotely, I'll be giving a talk on this topic. I'll see if remotely is possible. I neither have a US visa for in person, nor it suits my budget. > >I'm also planning to open a discussion on the list within the next >couple days or weeks about some protocol extensions that will be >necessary to let us fetch, clone, and push all repositories in >interoperability mode, so please feel free to follow along for that. Great!