Re: Incremental Backup of repositories using Git

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, May 08, 2025 at 08:47:47PM +0200, Michal Suchánek wrote:

> If you have one of those filesystems that support deduplication on
> filesystem level you could make each snapshot as a full repository with
> all objects unpacked, and the filesystem would deduplicate the objects
> for you.
> 
> The downside is that you have no way to do multiple full backups this
> way, and you would have to use something else for that (such as those
> bundles, or plain archiving the repository as files in a tar archive or
> such.

This is tempting, but I suspect that storing the objects unpacked will
become unfeasibly large, because you are missing out on delta
compression in the packfiles. You can compare the on-disk and
uncompressed sizes of objects in a repo like this:

  git cat-file --batch-all-objects --unordered \
               --batch-check='%(objectsize:disk) %(objectsize)' |
  perl -alne '
    $disk += $F[0];
    $true += $F[1];
    END {
      print "$true / $disk = ", int($true / $disk);
    }
  '

It's not entirely fair because the "true" size is missing out on zlib
compression that loose objects would get. But that's at best going to be
about 4:1 (and in practice worse, since trees are full of sha1 hashes
that don't compress very well).

In my copy of linux.git, that yields ~135G versus ~2.4G, for a factor of
56. Even if we grant 4:1 compression from zlib, that's still inflating
your on-disk repository by a factor of 14.

If you have the patience, you can run:

  git cat-file --batch-all-objects --unordered --batch | gzip | wc -c

to get a better sense of what it looks like with the extra deflate (this
is cheating a bit, because it will find cross-object compression
opportunities which would not be there in loose objects storage, but
should get you in the right ballpark).

You're probably also paying some inode costs with loose objects (1K
trees at the root of linux.git all pay 4K or whatever as individual
loose objects).

So you're probably much better off with some strategy .keep files. I.e.,
make a good big pack and mark it with .keep, so that it is retained
forever.

-Peff




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux