Way to "impersonate" remote or sync remotes without fetching everything?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

what is the best way to sync multiple remotes with each other without having to pull everything into a local checkout first?

Now as many projects have moved off of github onto their own gitlab instances it is kinda hard to keep track of all of the contributions as you basically have to create a fork on each of these instances to open PRs and because I'd like to have my contributions also on my gitlab server too I've now the issue of having to keep multiple remote repositories in sync. And then sometimes there are additional remotes from others that you'd like to interact with as well, so now these need to be mirrored as well. Then there is also the issue of archiving, e.g. freeing oneself from a dependency getting deleted upstream that I see at more and more companies I work with. They want to have an internal fork of everything (someties also wanting to review all of the commits before pulling them into their internal git server; but for most it was enough to just fail on rewritten/force-pushed history so far). And just for my personal projects I'd like to be able to have them on github, gitlab, and my own gitlab server, while being able to make edits (and accept PRs) at any of these and keep the repos in sync (or at least automatically sync them as long as nothing conflicts).

Is there a way to use some of the more advanced features of git to accomplis this? Like e.g. the alternative object database mechanism in a local temp project and pushing to the other remotes or something?

If any of you ran into similar issues in the past, how did you solve them?


What I tried so far in order:

My initial thought was to look for a way to do:

1. "impersonate" a remote (something like "git clone --bare" but
   without actually cloning any of the objects and querying the remote
   when needed)
2. Any git command within that repo will now be treated as if it was
   ran within/by the impersonated remote. E.g. It'll consider all of
   the objects the remote has as its own.
3. Add all of the other remotes as usual (the impersonated one wouldn't
   be reported as remote but as the local one the others were added to)
4. Doing a "git push --all" to any of these remotes will cause git to
   download thouse objects from the impersonated one that it needs to
   push towards that remote (and hasn't cached locally already).

As I wasn't able to find such a feature I tried to workaround this. At first I tried to use a shallow clone but that didn't really work as I didn't know where I'd have to start in order to be able to give all of the remotes a common commit to be detected as belonging together (esp. if one of the remotes didn't exist and had to be created by pushing to it...).

Then I managed to find these two suboptimal ones so far:

1. pull-push-style:
    1. "Pull --mirror --bare" the first one
    2. "Push --all" to all of the other(s)
2. Without always pulling the same server for the entire repo,
   regardless of it having changes or not:
    1. Create new and empty repo locally
    2. Add all of the remotes
    3. Fetch from the nearest server,
    4. "lfs fetch --all" from the nearest server
    5. fetch from all others
    6. "lfs fetch --all" from all others
    7. Hackishly update all of the local refs to basically be the ones
       of the remote that should be used as source (aka "rm -rf
       .git/refs/heads/*; cp -r .git/refs/remotes/origin/*
       .git/refs/heads/")
    8. "Push --all" to all of the remotes

The 1st one is the most simple, it almost always works (only fails in some very rare cases, like when the remote contains "zero-padded file modes" and such) but it'll cause unecessary load on the origin server as it has to always first make an entire clone of the repo. Even if the remotes are already in sync. Having local state would partially would solve that but then I'd need a lot of disk space and I wouldn't be able to just run this within a CI job. => Therefore undesirable.

The 2nd one is more sophisticated, it may look kinda righit at first, but it keeps breaking all the time for countless different reasons. Even though it allows to only pull new/changed commits from the origin server (aka it doesn't overload a single server unecessarily), it'll still clone all of the repos from the local git server onto the CI worker, esp. considering larger repos using git-lfs (or trying to do it in parallel with multiple/all of my repos) it causes the CI worker to run out of disk space as well as a lot of unecessary network and disk IO => Therefore also undesirable.

Also not to mention that none of my current approaches can do a real sync, they are all relying upon having one of the remotes designated as the authoritative source and if any of the others changed instead it'll just fail.

Sincerely,
Klaus Frank





[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux