RE: Incremental Backup of repositories using Git

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Justin/Jeff
(prev. email got rejected due to HTML content)
I tried researching more, if we have the previous state of repo then we can use git fetch --all and have storage level incremental backups, by using the changed objects under .git/objects (.pack by preventing auto gc). But it will not be feasible, to keep around the repo clone for the incremental backups.

I researched about git fetch-pack in a git init --bare repo, which might have helped here, but it is not working as expected:
1. It doesn't work with https ->
$ git fetch-pack --thin --shallow-exclude=28307688f7344018cad46c310826a82041b39b8d https://github.com/elastic/elasticsearch refs/heads/main
fatal: protocol 'https' is not supported
2. With ssh it says fatal: the remote end hung up unexpectedly ->
$ git fetch-pack --thin --shallow-exclude=28307688f7344018cad46c310826a82041b39b8d mailto:git@xxxxxxxxxx:elastic/elasticsearch.git refs/heads/main
fatal: the remote end hung up unexpectedly
Is what I require here (fetch new objects, without requiring previous objects present) technically possible with git-cli/ libgit2 library?
We can have some metadata to tell us what commits were backed up for each ref in the previous backup if that can help us.

As an alternative I tried out with API requests to download commit blobs, but that just hits rate limits too often and is far slower than git protocol.

-----Original Message-----
From: Jeff King <peff@xxxxxxxx> 
Sent: 09 May 2025 00:09
To: Abhishek Dalmia <adalmia@xxxxxxxxxxxxx>
Cc: Justin Tobler <jltobler@xxxxxxxxx>; Akash S <akashs@xxxxxxxxxxxxx>; git@xxxxxxxxxxxxxxx; Adithya Urugudige <aurugudige@xxxxxxxxxxxxx>
Subject: Re: Incremental Backup of repositories using Git

[You don't often get email from mailto:peff@xxxxxxxx. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]

External email. Inspect before opening.



On Thu, May 08, 2025 at 10:24:55AM +0000, Abhishek Dalmia wrote:

> I ran into an edge case while testing incremental backups with git 
> bundle. If a commit is created with a timestamp earlier than the 
> latest full or incremental backup, it can be excluded from the next 
> bundle due to the --since parameter even if there is a buffer.

Yeah, I don't think you want to use "--since" here, since it is about commit timestamps. You care about the state of the refs at a particular time. Or more accurately, you care that you have captured a particular ref state previously.

So ideally you'd snapshot that state in an atomic way, feed it as the "current" state when doing a bundle, and then save it for later. You can easily create such a snapshot with for-each-ref, but I don't think git-bundle has a way to provide the exact set of ref tips and their values (it just takes rev-list arguments, and wants to resolve the refs themselves).

You could probably get away with just creating a bundle with the current state, and then pulling the snapshot values from the created bundle.
Something like this:

  # for initial backup
  if ! test -e last-bundle-snapshot; then
    >last-bundle-snapshot
  fi

  # mark everything from last as seen, so we do not include it,
  # along with --all (or your choice of refs) to pick up everything
  # we have currently
  sed -e 's/^/^/' <last-bundle-snapshot |
  git bundle create out.bundle --all --stdin

  # and now save that ref state for next time; this is inherently
  # peeking at the bundle format.
  sed -ne '
        # quit when we see end of header
        /^$/q;
        # drop comments and old negatives; copy only first word (the oid)
        s/^\([^-#][^ ]*\).*/\1/p;
  ' <out.bundle >last-bundle-snapshot

Or alternatively, instead of using git-bundle at all, you could just store a collection of ref snapshots (from "for-each-ref") and thin packs (from "pack-objects --thin --stdout", fed from the old snapshot and the new). Which is really all that bundles are anyway.

-Peff




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux