On Tue, May 6, 2025 at 5:55 PM Derrick Stolee via GitGitGadget <gitgitgadget@xxxxxxxxx> wrote: > > From: Derrick Stolee <stolee@xxxxxxxxx> > > The sparse index allows storing directory entries in the index, marked > with the skip-wortkree bit and pointing to a tree object. This may be an > unexpected data shape for some implementation areas, so we are rolling > it out incrementally on a builtin-per-builtin basis. > > This change enables the sparse index for 'git apply'. The main > motivation for this change is that 'git apply' is used as a child > process of 'git add -p' and expanding the sparse index for each of those > child processes can lead to significant performance issues. > > The good news is that the actual index manipulation code used by 'git > apply' is already integrated with the sparse index, so the only product > change is to mark the builtin as allowing the sparse index so it isn't > inflated on read. > > The more involved part of this change is around adding tests that verify > how 'git apply' behaves in a sparse-checkout environment and whether or > not the index expands in certain operations. > > Signed-off-by: Derrick Stolee <stolee@xxxxxxxxx> > --- > builtin/apply.c | 7 +++- > t/t1092-sparse-checkout-compatibility.sh | 46 ++++++++++++++++++++++++ > 2 files changed, 52 insertions(+), 1 deletion(-) > > diff --git a/builtin/apply.c b/builtin/apply.c > index 84f1863d3ac3..a1e20c593d09 100644 > --- a/builtin/apply.c > +++ b/builtin/apply.c > @@ -12,7 +12,7 @@ static const char * const apply_usage[] = { > int cmd_apply(int argc, > const char **argv, > const char *prefix, > - struct repository *repo UNUSED) > + struct repository *repo) > { > int force_apply = 0; > int options = 0; > @@ -35,6 +35,11 @@ int cmd_apply(int argc, > &state, &force_apply, &options, > apply_usage); > > + if (repo) { > + prepare_repo_settings(repo); > + repo->settings.command_requires_full_index = 0; > + } > + > if (check_apply_state(&state, force_apply)) > exit(128); > > diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh > index f9b448792cb4..ab8bd371eff3 100755 > --- a/t/t1092-sparse-checkout-compatibility.sh > +++ b/t/t1092-sparse-checkout-compatibility.sh > @@ -1340,6 +1340,30 @@ test_expect_success 'submodule handling' ' > grep "160000 $(git -C initial-repo rev-parse HEAD) 0 modules/sub" cache > ' > > +test_expect_success 'git apply functionality' ' > + init_repos && > + > + test_all_match git checkout base && > + > + git -C full-checkout diff base..merge-right -- deep >patch-in-sparse && > + git -C full-checkout diff base..merge-right -- folder2 >patch-outside && > + > + # Apply a patch to a file inside the sparse definition > + test_all_match git apply --index --stat ../patch-in-sparse && > + test_all_match git status --porcelain=v2 && > + > + # Apply a patch to a file outside the sparse definition > + test_sparse_match test_must_fail git apply ../patch-outside && > + grep "No such file or directory" sparse-checkout-err && I was slightly confused by this at first, because I was thinking of the case where folder2/a exists in the working directory despite not matching the sparsity patterns, but you were testing a case where the working directory matched the sparsity patterns, so folder2/a doesn't exist. So, the check here looks good. And, when folder2/a does exist, then we're in the case handled by 82386b44963f (Merge branch 'en/present-despite-skipped', 2022-03-09), which forces the directory to not be considered sparse, and so it's just like the ../patch-in-sparse case...meaning it's not really a different case to test. So, it all makes sense. > + > + # But it works with --index and --cached > + test_all_match git apply --index --stat ../patch-outside && > + test_all_match git status --porcelain=v2 && > + test_all_match git reset --hard && > + test_all_match git apply --cached --stat ../patch-outside && > + test_all_match git status --porcelain=v2 > +' > + > # When working with a sparse index, some commands will need to expand the > # index to operate properly. If those commands also write the index back > # to disk, they need to convert the index to sparse before writing. > @@ -2345,6 +2369,28 @@ test_expect_success 'sparse-index is not expanded: check-attr' ' > ensure_not_expanded check-attr -a --cached -- folder1/a > ' > > +test_expect_success 'sparse-index is not expanded: git apply' ' > + init_repos && > + > + git -C sparse-index checkout base && > + git -C full-checkout diff base..merge-right -- deep >patch-in-sparse && > + git -C full-checkout diff base..merge-right -- folder2 >patch-outside && > + > + # Apply a patch to a file inside the sparse definition > + ensure_not_expanded apply --index --stat ../patch-in-sparse && > + > + # Apply a patch to a file outside the sparse definition > + # Fails when caring about the worktree. > + ensure_not_expanded ! apply ../patch-outside && > + > + # Expands when using --index. > + ensure_expanded apply --index ../patch-outside && > + git -C sparse-index reset --hard && All makes sense up to here. > + > + # Does not expand when using --cached. > + ensure_not_expanded apply --cached ../patch-outside Wait, what? That makes no sense. After some digging, I see why the test passed, but it's very misleading. Just before this command, if you ran the following commands from the sparse-index directory, you'd see the following: $ rm testme $ GIT_TRACE2_EVENT=$(pwd)/testme git apply --cached ../patch-outside $ grep ensure_full_index testme $ Which matches what you were testing and shows why it passed for you. But I'd argue the test is not correct and confusing for anyone that reads it, because: $ git ls-files -s --sparse | grep folder2 040000 cb4007891397aa2a451037d1c69e57f0cf498c24 0 folder2/0/ 100644 78981922613b2afb6025042ff6bd878ac1994e85 0 folder2/a In other words, the index was *already* (partially) expanded by the `git apply --index`, and the `git reset --hard` did not fix that contrary to expectations. Continuing from here we see: $ git reset --hard HEAD is now at 703fd3e initial commit $ git ls-files -s --sparse | grep folder2 040000 cb4007891397aa2a451037d1c69e57f0cf498c24 0 folder2/0/ 100644 78981922613b2afb6025042ff6bd878ac1994e85 0 folder2/a $ git sparse-checkout reapply $ git ls-files -s --sparse | grep folder2 040000 123706f6fc38949628eaf0483edbf97ba21123ae 0 folder2/ So, we need to do a `git sparse-checkout reapply` to make sure we were actually in the expected fully sparse state. From here... $ rm testme $ GIT_TRACE2_EVENT=$(pwd)/testme git apply --cached ../patch-outside $ grep ensure_full_index testme {"event":"region_enter","sid":"20250510T030856.854763Z-H8ec63c79-P0000371c","thread":"main","time":"2025-05-10T03:08:56.856008Z","file":"sparse-index.c","line":372,"repo":1,"nesting":1,"category":"index","label":"ensure_full_index"} {"event":"region_leave","sid":"20250510T030856.854763Z-H8ec63c79-P0000371c","thread":"main","time":"2025-05-10T03:08:56.856454Z","file":"sparse-index.c","line":455,"repo":1,"t_rel":0.000446,"nesting":1,"category":"index","label":"ensure_full_index"} {"event":"region_enter","sid":"20250510T030856.854763Z-H8ec63c79-P0000371c","thread":"main","time":"2025-05-10T03:08:56.857016Z","file":"sparse-index.c","line":372,"repo":1,"nesting":1,"category":"index","label":"ensure_full_index"} {"event":"region_leave","sid":"20250510T030856.854763Z-H8ec63c79-P0000371c","thread":"main","time":"2025-05-10T03:08:56.857135Z","file":"sparse-index.c","line":455,"repo":1,"t_rel":0.000119,"nesting":1,"category":"index","label":"ensure_full_index"} So, indeed, `git apply --cached ../patch-outside` DOES expand the index, as I expected. It has to when folder2/ is a directory in the index, so that we can get a folder2/a entry that we can modify. And that's just what we see: $ git ls-files -s --sparse | grep folder2 040000 cb4007891397aa2a451037d1c69e57f0cf498c24 0 folder2/0/ 100644 50f4ca4e6265a0497ec2ee6782648138914ad398 0 folder2/a Can you add a `git sparse-checkout reapply` right after your `git reset --hard`, and then switch the ensure_not_expanded to ensure_expanded for the apply --cached call?