Hi Justin/Jeff (prev. email got rejected due to HTML content) I tried researching more, if we have the previous state of repo then we can use git fetch --all and have storage level incremental backups, by using the changed objects under .git/objects (.pack by preventing auto gc). But it will not be feasible, to keep around the repo clone for the incremental backups. I researched about git fetch-pack in a git init --bare repo, which might have helped here, but it is not working as expected: 1. It doesn't work with https -> $ git fetch-pack --thin --shallow-exclude=28307688f7344018cad46c310826a82041b39b8d https://github.com/elastic/elasticsearch refs/heads/main fatal: protocol 'https' is not supported 2. With ssh it says fatal: the remote end hung up unexpectedly -> $ git fetch-pack --thin --shallow-exclude=28307688f7344018cad46c310826a82041b39b8d mailto:git@xxxxxxxxxx:elastic/elasticsearch.git refs/heads/main fatal: the remote end hung up unexpectedly Is what I require here (fetch new objects, without requiring previous objects present) technically possible with git-cli/ libgit2 library? We can have some metadata to tell us what commits were backed up for each ref in the previous backup if that can help us. As an alternative I tried out with API requests to download commit blobs, but that just hits rate limits too often and is far slower than git protocol. -----Original Message----- From: Jeff King <peff@xxxxxxxx> Sent: 09 May 2025 00:09 To: Abhishek Dalmia <adalmia@xxxxxxxxxxxxx> Cc: Justin Tobler <jltobler@xxxxxxxxx>; Akash S <akashs@xxxxxxxxxxxxx>; git@xxxxxxxxxxxxxxx; Adithya Urugudige <aurugudige@xxxxxxxxxxxxx> Subject: Re: Incremental Backup of repositories using Git [You don't often get email from mailto:peff@xxxxxxxx. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ] External email. Inspect before opening. On Thu, May 08, 2025 at 10:24:55AM +0000, Abhishek Dalmia wrote: > I ran into an edge case while testing incremental backups with git > bundle. If a commit is created with a timestamp earlier than the > latest full or incremental backup, it can be excluded from the next > bundle due to the --since parameter even if there is a buffer. Yeah, I don't think you want to use "--since" here, since it is about commit timestamps. You care about the state of the refs at a particular time. Or more accurately, you care that you have captured a particular ref state previously. So ideally you'd snapshot that state in an atomic way, feed it as the "current" state when doing a bundle, and then save it for later. You can easily create such a snapshot with for-each-ref, but I don't think git-bundle has a way to provide the exact set of ref tips and their values (it just takes rev-list arguments, and wants to resolve the refs themselves). You could probably get away with just creating a bundle with the current state, and then pulling the snapshot values from the created bundle. Something like this: # for initial backup if ! test -e last-bundle-snapshot; then >last-bundle-snapshot fi # mark everything from last as seen, so we do not include it, # along with --all (or your choice of refs) to pick up everything # we have currently sed -e 's/^/^/' <last-bundle-snapshot | git bundle create out.bundle --all --stdin # and now save that ref state for next time; this is inherently # peeking at the bundle format. sed -ne ' # quit when we see end of header /^$/q; # drop comments and old negatives; copy only first word (the oid) s/^\([^-#][^ ]*\).*/\1/p; ' <out.bundle >last-bundle-snapshot Or alternatively, instead of using git-bundle at all, you could just store a collection of ref snapshots (from "for-each-ref") and thin packs (from "pack-objects --thin --stdout", fed from the old snapshot and the new). Which is really all that bundles are anyway. -Peff