Re: What kind of help is needed for SHA-256 work in the next ~year?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2025-09-12 at 17:59:06, Emily Shaffer wrote:
> brian has been working on the SHA-256 implementation and now on the
> interop, pretty much solo, for quite some time. I realize that it's a
> bit late in the party to ask, but as we're talking about switching the
> default for new repositories in Git 3.0, I think it is past time for
> the rest of the project to pitch in if we can.

I do very much appreciate you asking this.

> What kind of help would be useful to you at this point, brian? How
> much of the work is planned and ready for you to delegate to someone
> else (and what's the timeline like, if you have one)? Do you need help
> with testing any parts of the existing code in scaled scenarios? My
> understanding is that you have a roadmap to guide your own work, but
> if it's not shareable, is that something you could use some program
> management help with? Anything like that?

I have about 93 patches in my `sha256-interop` branch right now, which
is based off v4 of Patrick's Rust series.  Much of the functionality
works: the legacy loose object maps (which I'm replacing with the new
binary format), pack index v3, full clones and pushes, and some shallow
functionality.

A lot of what I need help with is getting these patches production ready
and sent to the list.  (Some of them are clearly marked as WIP with a
comment why.)  For instance, pack index v3 works just fine, but we need
more tests for it.  I haven't done any sort of scale testing yet,
either, so if that's something we want, then help with that would be
great.

Similarly, even when the binary loose object maps work, we'll still need
to prune old objects from them and compact the maps as part of `git gc`,
which would be something I'd appreciate help with.

I do have permission to work on this as part of my job (starting in
about October), but we want to release in a year and I'm expecting at
least 200 (if not 300 or more) total patches for this project.  What I
don't want to do is try to shovel several 50-patch series in at the last
minute, which would be unkind to reviewers and not produce the best
quality code, so trying to get the existing patches cleaned up and in
relatively soon would help us make more progress at a more leisurely
pace.  I also still do have other duties at work as well (after all, my
team is responsible for serving your Git traffic, which I think we'd all
like to continue), so assistance would be super helpful.

There are also a giant heap of broken tests when run in compatibility
mode.  Some of those tests are broken because, say, we lack support for
partial clone, and we'll fix those by implementing partial clone. But
there are lots of tests that are broken for boring reasons, such as the
fact that in compatibility mode we can't accept broken objects (because
they can't be mapped into the other algorithm), and those need to be
marked or fixed accordingly.  Getting those marked or fixed would be a
super helpful contribution (I even have a test prerequisite for this
purpose), as would fixing other routine test failures.

I have some tests for fetching and pushing in interoperability mode
which will run even when the entire testsuite is run in single-hash
mode, but I think we're also going to want more tests: HTTP, the Git
protocol, protocols v0 and v2, single-hash servers and dual-hash clients
and the reverse, unsupported cases[0], and so on.  That would also be very
helpful since it will help us make sure our changes are very robust.

We'll also need to implement partial clone and submodule support.
Submodules are especially tricky because to look up the object mapping,
we also need the submodule to be in interoperability mode.  And, because
people are absolutely going to want this kind of thing, we ideally need
some script or command to convert repositories from a single-hash (say,
SHA-1) to dual-hash (interoperability) mode taking into account
submodules (which must be done _before_ the main repo) and all the other
edge cases, whether that's in place[1] or to a separate (bare or non-bare)
repository[2].  Someone picking that work up would be greatly appreciated.

And there's still more beyond that as well.  Some of this I can pick up,
but assistance would of course be appreciated.

I'm going to spend the next week kind of tying up some loose ends in my
current work and getting things in my branch in a state where someone
could pick some of this work up.  I'll also write up a complete list of
what still needs to be done in case folks would like to help out and
send it to the list in reply to this thread.

As for project management, I would be fine with simply using
GitHub/GitLab/Forgejo issues/projects in an otherwise empty repository
for tracking who's working on what if that's acceptable to others.  I do
feel some sort of tracking like this would be useful if we have multiple
contributors, since it will help avoid accidentally working on the same
thing as someone else, but I'm not super picky as to what it is.

> I can't guarantee that Google will be able to jump on and help right
> away, but at least understanding what needs doing is a good start for
> me to be able to ask around - especially if we're looking ahead to
> 2026, that gives me more room to try and get help. I thought to ask on
> the list instead of mailing brian directly because I assume that's the
> case for the other corporate contributors to the project, too ;)

I appreciate the offer.  I think with our desired timeframe, there's
definitely enough work for two or three, and possibly more, people.  I
would be very grateful for any assistance that can be provided here.

[0] For instance, we cannot do a shallow or partial clone to a dual-hash
client unless the server supports mapping using both algorithms, since
those types of clone have incomplete history and therefore the client
cannot perform all of the conversion themselves.  We will want to
provide a nice error message to the user and some documentation for this
case.
[1] In place is ideal, since that could also be useful for forges who
want to do this conversion, but any command is better than no command.
[2] I think a shell command would be fine for this, although Dscho and
the other Windows folks may not love the performance.  This might also
be an exciting opportunity to write some Rust if the authors prefer that
approach.
-- 
brian m. carlson (they/them)
Toronto, Ontario, CA

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux