Hi,
what is the best way to sync multiple remotes with each other without
having to pull everything into a local checkout first?
Now as many projects have moved off of github onto their own gitlab
instances it is kinda hard to keep track of all of the contributions as
you basically have to create a fork on each of these instances to open
PRs and because I'd like to have my contributions also on my gitlab
server too I've now the issue of having to keep multiple remote
repositories in sync. And then sometimes there are additional remotes
from others that you'd like to interact with as well, so now these need
to be mirrored as well. Then there is also the issue of archiving, e.g.
freeing oneself from a dependency getting deleted upstream that I see at
more and more companies I work with. They want to have an internal fork
of everything (someties also wanting to review all of the commits before
pulling them into their internal git server; but for most it was enough
to just fail on rewritten/force-pushed history so far). And just for my
personal projects I'd like to be able to have them on github, gitlab,
and my own gitlab server, while being able to make edits (and accept
PRs) at any of these and keep the repos in sync (or at least
automatically sync them as long as nothing conflicts).
Is there a way to use some of the more advanced features of git to
accomplis this? Like e.g. the alternative object database mechanism in a
local temp project and pushing to the other remotes or something?
If any of you ran into similar issues in the past, how did you solve them?
What I tried so far in order:
My initial thought was to look for a way to do:
1. "impersonate" a remote (something like "git clone --bare" but
without actually cloning any of the objects and querying the remote
when needed)
2. Any git command within that repo will now be treated as if it was
ran within/by the impersonated remote. E.g. It'll consider all of
the objects the remote has as its own.
3. Add all of the other remotes as usual (the impersonated one wouldn't
be reported as remote but as the local one the others were added to)
4. Doing a "git push --all" to any of these remotes will cause git to
download thouse objects from the impersonated one that it needs to
push towards that remote (and hasn't cached locally already).
As I wasn't able to find such a feature I tried to workaround this. At
first I tried to use a shallow clone but that didn't really work as I
didn't know where I'd have to start in order to be able to give all of
the remotes a common commit to be detected as belonging together (esp.
if one of the remotes didn't exist and had to be created by pushing to
it...).
Then I managed to find these two suboptimal ones so far:
1. pull-push-style:
1. "Pull --mirror --bare" the first one
2. "Push --all" to all of the other(s)
2. Without always pulling the same server for the entire repo,
regardless of it having changes or not:
1. Create new and empty repo locally
2. Add all of the remotes
3. Fetch from the nearest server,
4. "lfs fetch --all" from the nearest server
5. fetch from all others
6. "lfs fetch --all" from all others
7. Hackishly update all of the local refs to basically be the ones
of the remote that should be used as source (aka "rm -rf
.git/refs/heads/*; cp -r .git/refs/remotes/origin/*
.git/refs/heads/")
8. "Push --all" to all of the remotes
The 1st one is the most simple, it almost always works (only fails in
some very rare cases, like when the remote contains "zero-padded file
modes" and such) but it'll cause unecessary load on the origin server as
it has to always first make an entire clone of the repo. Even if the
remotes are already in sync. Having local state would partially would
solve that but then I'd need a lot of disk space and I wouldn't be able
to just run this within a CI job. => Therefore undesirable.
The 2nd one is more sophisticated, it may look kinda righit at first,
but it keeps breaking all the time for countless different reasons. Even
though it allows to only pull new/changed commits from the origin server
(aka it doesn't overload a single server unecessarily), it'll still
clone all of the repos from the local git server onto the CI worker,
esp. considering larger repos using git-lfs (or trying to do it in
parallel with multiple/all of my repos) it causes the CI worker to run
out of disk space as well as a lot of unecessary network and disk IO =>
Therefore also undesirable.
Also not to mention that none of my current approaches can do a real
sync, they are all relying upon having one of the remotes designated as
the authoritative source and if any of the others changed instead it'll
just fail.
Sincerely,
Klaus Frank