Re: Semantics of change IDs (Re: Gerrit, GitButler, and Jujutsu projects collaborating on change-id commit footer)

"brian m. carlson" <sandals@xxxxxxxxxxxxxxxxxxxx> · Mon, 12 May 2025 22:04:49 +0000

On 2025-05-12 at 21:43:46, Martin von Zweigbergk wrote:
> On Mon, 12 May 2025 at 14:07, Nico Williams <nico@xxxxxxxxxxxxxxxx> wrote:
> >
> > How is this stable ID constructed?
> 
> It's just random bytes (16 when using the Git backend, 32 in the
> Google backend).
> 
> > How would things other than jj construct these?  We spent many messages
> > trying to work that out and in my estimate that wasn't settled.
> 
> Random bytes has worked well for jj.

I would like to suggest that we use a deterministic approach.  People
rely on Git commits being deterministic, including in my stash
import/export series[0].  In addition, it's important to avoid any
allegations of side channels or leaking information in commits, which
would be a concern in many environments and which a deterministic
approach would avoid[1].

I'd suggest a simple SHA-256 hash of the original commit data (for both
SHA-1 and SHA-256 commits, but one that would change to a new hash if we
added one) or an HMAC-SHA-256 with a fixed and documented key.

I would also recommend a config option to avoid creating these IDs for
those who don't want them included for privacy reasons.  I expect to set
such an option, for instance.

[0] That series will definitely require that they be disabled when
creating commits, since the goal is to ensure bit-for-bit
reproducibility between different Git versions so that users can
immediately tell if the stash history is identical.

[1] For instance, it's an easy way to leak keys or other credentials
without people noticing just by pushing an innocuous-looking commit.
-- 
brian m. carlson (they/them)
Toronto, Ontario, CA
Attachment:
signature.asc

Description: PGP signature