[PATCH v2 0/2] [2.51.0 Bug] Missing singleton objects in 'git repack -adf --path-walk'

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Now that the --path-walk feature for git repack is out in the wild and
getting more visibility than it did in the Git for Windows fork, the
following issue was brought to my attention:

Some folks would report missing objects after git repack -adf --path-walk!

It turns out that this snuck through the cracks because it was pretty
difficult to create a reproducing test case (patch 1) but it boils down to:

 1. A path has exactly one version across all of the history being repacked.
 2. That path is in the index.
 3. The object at that path is not a loose object.
 4. pack.useSparse=true in the config (this is the default)

It is also something where users don't necessarily notice the missing
objects until they fetch and a missing object is used as a delta base. Doing
normal checkouts doesn't cause changes to these files, so they are never
opened by Git. Users hitting this issue can usually recover using git fetch
--refetch to repopulate the missing objects from a remote (unless they never
had a remote at all).

Patch 1 introduces the fix for this issue, which is related to forgetting to
initialize a struct indicator when walking the pending objects.

When reflecting on the ways that I missed this when building the feature, I
think the core issue was an overreliance on using bare repos in testing. I
also think that the way that the UNINTERESTING object exploration was
implemented was particularly fragile to missing updates to the
initialization of the struct, so patch 2 adds a new initializer to reduce
duplicate code and to help avoid this mistake in the future.


Updates in v2
=============

Thank you for the quick and careful review of these patches.

 * The test and bug fix are now in the same patch.
 * Several commit message typos/grammar edits.

Thanks, -Stolee

P.S. CC'ing all original reviewers of the series.

Derrick Stolee (2):
  path-walk: fix setup of pending objects
  path-walk: create initializer for path lists

 path-walk.c       | 55 +++++++++++++++++++----------------------
 t/t7700-repack.sh | 63 +++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 88 insertions(+), 30 deletions(-)


base-commit: c44beea485f0f2feaf460e2ac87fdd5608d63cf0
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1956%2Fderrickstolee%2Fpath-walk-missing-objects-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1956/derrickstolee/path-walk-missing-objects-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/1956

Range-diff vs v1:

 1:  5b19173c03d ! 1:  2feedf5a1ee t7700: add failing --path-walk test
     @@ Metadata
      Author: Derrick Stolee <dstolee@xxxxxxxxxxxxx>
      
       ## Commit message ##
     -    t7700: add failing --path-walk test
     +    path-walk: fix setup of pending objects
      
          Users reported an issue where objects were missing from their local
     -    enlistments after a full repack using 'git repack -adf --path-walk'.
     -    This was alarming, but took a while to create a reproducer.
     +    repositories after a full repack using 'git repack -adf --path-walk'.
     +    This was alarming and took a while to create a reproducer. Here, we fix
     +    the bug and include a test case that would fail without this fix.
      
          The root cause is that certain objects existed in the index and had no
          second versions. These objects are usually blobs, though trees can be
     @@ Commit message
          issue for reproducibility:
      
           1. File a/b/c has only one committed version.
     -     2. Files a/i and x/y only exists as staged changes.
     +     2. Files a/i and x/y only exist as staged changes.
           3. Tree x/ only exists in the cache-tree.
      
          After performing a non-path-walk repack to force all loose objects into
     @@ Commit message
          Note that since the staged root tree is missing, the fsck output cannot
          even report that the staged x/ tree is missing as well.
      
     -    This bug will be fixed in the next change.
     +    The core problem here is that the "maybe_interesting" member of 'struct
     +    type_and_oid_list' is not initialized to '1'. This member was added in
     +    6333e7ae0b (path-walk: mark trees and blobs as UNINTERESTING,
     +    2024-12-20) in a way to help when creating packfiles for a small commit
     +    range using the sparse path algorithm (enabled by pack.useSparse=true).
     +
     +    The idea here is that the list is marked as "maybe_interesting" if an
     +    object is added that does not have the UNINTERESTING flag on it. Later,
     +    this is checked again in case all objects in the list were marked
     +    UNINTERESTING after that point in time. In this case, the algorithm
     +    skips the list as there is no reason to visit it.
     +
     +    This leads to the problem where the "maybe_interesting" member was not
     +    appropriately initialized when the list is created from pending objects.
     +    Initializing this in the correct places fixes the bug.
     +
     +    To reduce risk of similar bugs around initializing this structure, a
     +    follow-up change will make initializing lists use a shared method.
      
          Signed-off-by: Derrick Stolee <stolee@xxxxxxxxx>
      
     + ## path-walk.c ##
     +@@ path-walk.c: static int setup_pending_objects(struct path_walk_info *info,
     + 					list->type = OBJ_TREE;
     + 					strmap_put(&ctx->paths_to_lists, path, list);
     + 				}
     ++				list->maybe_interesting = 1;
     + 				oid_array_append(&list->oids, &obj->oid);
     + 				free(path);
     + 			} else {
     +@@ path-walk.c: static int setup_pending_objects(struct path_walk_info *info,
     + 					list->type = OBJ_BLOB;
     + 					strmap_put(&ctx->paths_to_lists, path, list);
     + 				}
     ++				list->maybe_interesting = 1;
     + 				oid_array_append(&list->oids, &obj->oid);
     + 			} else {
     + 				/* assume a root tree, such as a lightweight tag. */
     +
       ## t/t7700-repack.sh ##
      @@ t/t7700-repack.sh: test_expect_success '-n overrides repack.updateServerInfo=true' '
       	test_server_info_missing
       '
       
     -+test_expect_failure 'pending objects are repacked appropriately' '
     ++test_expect_success 'pending objects are repacked appropriately' '
     ++	test_when_finished rm -rf pending &&
      +	git init pending &&
      +
      +	(
      +		cd pending &&
      +
     ++		# Commit file, a/b/c and never change them.
      +		mkdir -p a/b &&
      +		echo singleton >file &&
      +		echo stuff >a/b/c &&
     @@ t/t7700-repack.sh: test_expect_success '-n overrides repack.updateServerInfo=tru
      +		git add file a &&
      +		git commit -m "single blobs" &&
      +
     ++		# Files a/d and a/e will not be singletons.
      +		echo d >a/d &&
      +		echo e >a/e &&
      +		git add a &&
     @@ t/t7700-repack.sh: test_expect_success '-n overrides repack.updateServerInfo=tru
      +		# test that the cache-tree is walked, too.
      +		git sparse-checkout set --sparse-index a x &&
      +
     -+		# Just _stage_ the changes.
     ++		# Create staged changes:
     ++		# * a/e now has multiple versions.
     ++		# * a/i now has only one version.
      +		echo f >a/d &&
      +		echo h >a/e &&
      +		echo i >a/i &&
     ++		git add a &&
     ++
     ++		# Stage and unstage a change to make use of
     ++		# resolve-undo cache and how that impacts fsck.
      +		mkdir x &&
      +		echo y >x/y &&
     -+		git add a x &&
     ++		git add x &&
     ++		xy=$(git rev-parse :x/y) &&
     ++		git rm --cached x/y &&
     ++
     ++		# The blob for x/y must persist through repacks,
     ++		# but fsck currently ignores the REUC extension
     ++		# for finding links to the blob.
     ++		cat >expect <<-EOF &&
     ++		dangling blob $xy
     ++		EOF
      +
      +		# Bring the loose objects into a packfile to avoid
      +		# leftovers in next test. Without this, the loose
      +		# objects persist and the test succeeds for other
      +		# reasons.
      +		git repack -adf &&
     -+		git fsck &&
     ++		git fsck >out &&
     ++		test_cmp expect out &&
      +
      +		# Test path walk version with pack.useSparse.
      +		git -c pack.useSparse=true repack -adf --path-walk &&
     -+		git fsck
     ++		git fsck >out &&
     ++		test_cmp expect out
      +	)
      +'
      +
 2:  0dc4a6323e6 < -:  ----------- path-walk: fix setup of pending objects
 3:  dd458e04354 = 2:  fc2c171f52d path-walk: create initializer for path lists

-- 
gitgitgadget




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux