Re: Making bit-by-bit reproducible Git Bundles?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Jeff King <peff@xxxxxxxx> writes:

>   [now without threading]
>   $ git -c pack.threads=1 bundle create --no-progress - HEAD | sha1sum
>   c897caf9c68d2c37d997d3973196886af3b0b46e  -
>
>   [and we can do it again. yay!]
>   $ git -c pack.threads=1 bundle create --no-progress - HEAD | sha1sum
>   c897caf9c68d2c37d997d3973196886af3b0b46e  -

That's the commands I use -- it doesn't lead to the same hash in two
different 'git clone's.  I tried running 'git clone' with the same '-c
pack.threads=1' but it made no difference.

>   2. There is no way to pass pack-objects options down through
>      git-bundle. So you'd have to either assemble the bundle yourself,
>      or perhaps generate a stable on-disk pack state, and then generate
>      the bundle. Perhaps something like:
>
>        # make one single pack, with no reuse, using the default options
>        git -c pack.threads=1 repack -adf

Yay!  You may have solved this for me.  I have to verify this a bit
more, but this looks promising (these are two different git clones):

jas@kaka:~/t/gnulib-1$ git -c pack.threads=1 repack -adf
jas@kaka:~/t/gnulib-1$ git -c 'pack.threads=1' bundle create gnulib.bundle --all
jas@kaka:~/t/gnulib-1$ sha256sum gnulib.bundle 
c780bb07501cf016e702fbe3f52704b4f64edd6882c13c9be0f3f114c894e890  gnulib.bundle
jas@kaka:~/t/gnulib-1$ cd ../gnulib-2
jas@kaka:~/t/gnulib-2$ git -c pack.threads=1 repack -adf
jas@kaka:~/t/gnulib-2$ git -c 'pack.threads=1' bundle create gnulib.bundle --all
jas@kaka:~/t/gnulib-2$ sha256sum gnulib.bundle 
c780bb07501cf016e702fbe3f52704b4f64edd6882c13c9be0f3f114c894e890  gnulib.bundle
jas@kaka:~/t/gnulib-2$ 

> So I think it's possible, but I doubt it's very ergonomic. You're
> probably better off using some checksum over Git's logical model, rather
> than the stored bytes. The obvious one is that a single Git commit hash
> unambiguously represents the whole tree and all of history leading up to
> it, because of the chains of hashes.
>
> But that implies you trust Git's object hash algorithm.

Right -- I think anything but bit-by-bit identical files is going to be
too complex to verify.

>   # print all commits in topological order, with ties broken by
>   # committer date, which should be stable. And then follow up with the
>   # trees and blobs for each.
>   git rev-list --topo-order --objects HEAD >objects
>
>   # now print the contents of each object (preceded by its name, type,
>   # and length, so there's no chance of weird prepending or appending
>   # attacks). We cut off the path information from rev-list here, since
>   # the ordered set of objects is all we care about.
>   cut -d' ' -f1 objects |
>   git cat-file --batch >content
>
>   # and then take a hash over that content; this will be unambiguous.
>   sha256sum <content

How to read this output?  Could this be made git bundle compatible?

But if the above is solves it, this part isn't necessary.

/Simon

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux