Re: git treeless-clone + wait + pull → problem, again pull → OK

Jeff King <peff@xxxxxxxx> · Tue, 22 Jul 2025 05:17:49 -0400

[+cc Jonathan Tan]

On Tue, Jul 01, 2025 at 12:24:05PM +0300, Дилян Палаузов wrote:

> the problem is that when I do a treeless or blobless clone and some
> time later git pull, git prints many, many lines that it tries to
> fetch data, then I interrupt with Ctrl+C, then do git pull again and
> then it completes.  However I never tried to precisely document this
> until now:
> 
> On 26 June 2025 I do
> 
> $ git clone --filter=tree:0 https://github.com/git/git.git
> …
> $ git show --oneline
> cf6f63ea6 (HEAD -> master, origin/master, origin/HEAD) The fourth batch
> 
> 
> Today I do 
> 
> $ git pull
> From https://github.com/git/git
>    cf6f63ea6..83014dc05  master     -> origin/master
>    74e6fc65d..83e99ddf4  next       -> origin/next
>  + bc3287e71...a842a7780 seen       -> origin/seen  (forced update)
>    fefffbb31..7af8e2e03  todo       -> origin/todo
> fatal: You are attempting to fetch cf6f63ea6bf35173e02e18bdc6a4ba41288acff9, which is in the commit graph file but not in the object database.
> This is probably due to repo corruption.
> If you are attempting to repair this repo corruption by refetching the missing object, use 'git fetch --refetch' with the missing object.
> fatal: could not fetch 5e66731277a4d791043dc51e2804dc0b496c523b from promisor remote

I took a look at this a few weeks ago and came up with a reproducible
recipe:

  git init repo
  cd repo

  url=https://github.com/git/git.git
  git fetch --filter=tree:0 $url cf6f63ea6bf35173e02e18bdc6a4ba41288acff9:refs/heads/foo
  git checkout foo
  git commit-graph write --reachable
  git fetch $url

That final fetch ends up spawning a seemingly endless (though I suspect
actually finite) series of child fetches. Looking at the callstack, it's
coming from fetch_submodules(), which wants to do tree diffs in the
fetched history looking for changed submodules. But of course we don't
have those trees, so we fault them in one by one.

Which certainly seems non-ideal. But what is more interesting is that
while doing so, we eventually hit that same fatal error:

  fatal: You are attempting to fetch cf6f63ea6bf35173e02e18bdc6a4ba41288acff9, which is in the commit graph file but not in the object database.

I tried swapping out $url for a local copy of the repo (to stop
hammering poor GitHub's servers). But it doesn't seem to reproduce. That
plus the apparently-random time of failure makes me think it's a race
condition.

And there is something interesting happening in the background here:
each of those sub-fetches may kick off an asynchronous "git maintenance"
run. Which will find something useful to do because we're building up a
big pile of packs. So it eventually tries to repack.

And so it seems our race is that fetch sees the commit in the commit
graph but sometimes _not_ the actual packfile (because it got repacked).
Usually we'd try to re-scan the packfiles for exactly this case. But the
caller in deref_without_lazy_fetch() does not do so. It calls
has_object() without any flags, avoiding the re-scan, like this:

          commit = lookup_commit_in_graph(the_repository, oid);
          if (commit) {
                  if (mark_tags_complete_and_check_obj_db) {
                          if (!odb_has_object(the_repository->objects, oid, 0))
                                  die_in_commit_graph_only(oid);
                  }
                  return commit;
          }

This is due to 5d4cc78f72 (fetch-pack: die if in commit graph but not
obj db, 2024-11-05). I'm not sure I fully understand all of the details
of that commit, so I don't have a solution. Maybe it should be passing
HAS_OBJECT_RECHECK_PACKED?

Normally I'd worry about performance, since fetch is often asking about
objects we don't expect to have (and that re-scan is expensive and
normally just confirms that no, we don't have the object). But in this
case we'll only hit this check if we called lookup_commit_in_graph().
Which implies we _do_ expect to have it (or it's a weird state where the
graph file is stale). So maybe it would be OK to use that flag in this
call?


I don't think this race is strictly limited to using filters. It's just
that it's a convenient way to start a big string of fetches that will
race with repacks.

Whether or not fetch should avoid kicking off that big string of
fetches, I don't know. Passing --no-recurse-submodules obviously dulls
the pain. Perhaps the default behavior ought to be different in a
tree-less repo. Or maybe those tree diffs should be done with
lazy-fetching turned off (there is no point in recursing for a version
of a submodule whose parent tree we don't even have!). But I think
that's all orthogonal to the race.

Hmm. So I actually was just intending to write up my notes from a few
weeks ago. But I think I may have talked myself into the idea that this
patch is the right fix:

diff --git a/fetch-pack.c b/fetch-pack.c
index 5e74235fc0..7288e2d251 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -142,7 +142,8 @@ static struct commit *deref_without_lazy_fetch(const struct object_id *oid,
 	commit = lookup_commit_in_graph(the_repository, oid);
 	if (commit) {
 		if (mark_tags_complete_and_check_obj_db) {
-			if (!odb_has_object(the_repository->objects, oid, 0))
+			if (!odb_has_object(the_repository->objects, oid,
+					    HAS_OBJECT_RECHECK_PACKED))
 				die_in_commit_graph_only(oid);
 		}
 		return commit;

I'd like to hear what Jonathan Tan thinks, though.

-Peff