Re: [PATCH 1/3] apply: integrate with the sparse index

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, May 6, 2025 at 5:55 PM Derrick Stolee via GitGitGadget
<gitgitgadget@xxxxxxxxx> wrote:
>
> From: Derrick Stolee <stolee@xxxxxxxxx>
>
> The sparse index allows storing directory entries in the index, marked
> with the skip-wortkree bit and pointing to a tree object. This may be an
> unexpected data shape for some implementation areas, so we are rolling
> it out incrementally on a builtin-per-builtin basis.
>
> This change enables the sparse index for 'git apply'. The main
> motivation for this change is that 'git apply' is used as a child
> process of 'git add -p' and expanding the sparse index for each of those
> child processes can lead to significant performance issues.
>
> The good news is that the actual index manipulation code used by 'git
> apply' is already integrated with the sparse index, so the only product
> change is to mark the builtin as allowing the sparse index so it isn't
> inflated on read.
>
> The more involved part of this change is around adding tests that verify
> how 'git apply' behaves in a sparse-checkout environment and whether or
> not the index expands in certain operations.
>
> Signed-off-by: Derrick Stolee <stolee@xxxxxxxxx>
> ---
>  builtin/apply.c                          |  7 +++-
>  t/t1092-sparse-checkout-compatibility.sh | 46 ++++++++++++++++++++++++
>  2 files changed, 52 insertions(+), 1 deletion(-)
>
> diff --git a/builtin/apply.c b/builtin/apply.c
> index 84f1863d3ac3..a1e20c593d09 100644
> --- a/builtin/apply.c
> +++ b/builtin/apply.c
> @@ -12,7 +12,7 @@ static const char * const apply_usage[] = {
>  int cmd_apply(int argc,
>               const char **argv,
>               const char *prefix,
> -             struct repository *repo UNUSED)
> +             struct repository *repo)
>  {
>         int force_apply = 0;
>         int options = 0;
> @@ -35,6 +35,11 @@ int cmd_apply(int argc,
>                                    &state, &force_apply, &options,
>                                    apply_usage);
>
> +       if (repo) {
> +               prepare_repo_settings(repo);
> +               repo->settings.command_requires_full_index = 0;
> +       }
> +
>         if (check_apply_state(&state, force_apply))
>                 exit(128);
>
> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> index f9b448792cb4..ab8bd371eff3 100755
> --- a/t/t1092-sparse-checkout-compatibility.sh
> +++ b/t/t1092-sparse-checkout-compatibility.sh
> @@ -1340,6 +1340,30 @@ test_expect_success 'submodule handling' '
>         grep "160000 $(git -C initial-repo rev-parse HEAD) 0    modules/sub" cache
>  '
>
> +test_expect_success 'git apply functionality' '
> +       init_repos &&
> +
> +       test_all_match git checkout base &&
> +
> +       git -C full-checkout diff base..merge-right -- deep >patch-in-sparse &&
> +       git -C full-checkout diff base..merge-right -- folder2 >patch-outside &&
> +
> +       # Apply a patch to a file inside the sparse definition
> +       test_all_match git apply --index --stat ../patch-in-sparse &&
> +       test_all_match git status --porcelain=v2 &&
> +
> +       # Apply a patch to a file outside the sparse definition
> +       test_sparse_match test_must_fail git apply ../patch-outside &&
> +       grep "No such file or directory" sparse-checkout-err &&

I was slightly confused by this at first, because I was thinking of
the case where folder2/a exists in the working directory despite not
matching the sparsity patterns, but you were testing a case where the
working directory matched the sparsity patterns, so folder2/a doesn't
exist.

So, the check here looks good.

And, when folder2/a does exist, then we're in the case handled by
82386b44963f (Merge branch 'en/present-despite-skipped', 2022-03-09),
which forces the directory to not be considered sparse, and so it's
just like the ../patch-in-sparse case...meaning it's not really a
different case to test.

So, it all makes sense.

> +
> +       # But it works with --index and --cached
> +       test_all_match git apply --index --stat ../patch-outside &&
> +       test_all_match git status --porcelain=v2 &&
> +       test_all_match git reset --hard &&
> +       test_all_match git apply --cached --stat ../patch-outside &&
> +       test_all_match git status --porcelain=v2
> +'
> +
>  # When working with a sparse index, some commands will need to expand the
>  # index to operate properly. If those commands also write the index back
>  # to disk, they need to convert the index to sparse before writing.
> @@ -2345,6 +2369,28 @@ test_expect_success 'sparse-index is not expanded: check-attr' '
>         ensure_not_expanded check-attr -a --cached -- folder1/a
>  '
>
> +test_expect_success 'sparse-index is not expanded: git apply' '
> +       init_repos &&
> +
> +       git -C sparse-index checkout base &&
> +       git -C full-checkout diff base..merge-right -- deep >patch-in-sparse &&
> +       git -C full-checkout diff base..merge-right -- folder2 >patch-outside &&
> +
> +       # Apply a patch to a file inside the sparse definition
> +       ensure_not_expanded apply --index --stat ../patch-in-sparse &&
> +
> +       # Apply a patch to a file outside the sparse definition
> +       # Fails when caring about the worktree.
> +       ensure_not_expanded ! apply ../patch-outside &&
> +
> +       # Expands when using --index.
> +       ensure_expanded apply --index ../patch-outside &&
> +       git -C sparse-index reset --hard &&

All makes sense up to here.

> +
> +       # Does not expand when using --cached.
> +       ensure_not_expanded apply --cached ../patch-outside

Wait, what?  That makes no sense.

After some digging, I see why the test passed, but it's very
misleading.  Just before this command, if you ran the following
commands from the sparse-index directory, you'd see the following:

$ rm testme
$ GIT_TRACE2_EVENT=$(pwd)/testme git apply --cached ../patch-outside
$ grep ensure_full_index testme
$

Which matches what you were testing and shows why it passed for you.
But I'd argue the test is not correct and confusing for anyone that
reads it, because:

$ git ls-files -s --sparse | grep folder2
040000 cb4007891397aa2a451037d1c69e57f0cf498c24 0    folder2/0/
100644 78981922613b2afb6025042ff6bd878ac1994e85 0    folder2/a

In other words, the index was *already* (partially) expanded by the
`git apply --index`, and the `git reset --hard` did not fix that
contrary to expectations.  Continuing from here we see:

$ git reset --hard
HEAD is now at 703fd3e initial commit
$ git ls-files -s --sparse | grep folder2
040000 cb4007891397aa2a451037d1c69e57f0cf498c24 0    folder2/0/
100644 78981922613b2afb6025042ff6bd878ac1994e85 0    folder2/a
$ git sparse-checkout reapply
$ git ls-files -s --sparse | grep folder2
040000 123706f6fc38949628eaf0483edbf97ba21123ae 0    folder2/

So, we need to do a `git sparse-checkout reapply` to make sure we were
actually in the expected fully sparse state.  From here...

$ rm testme
$ GIT_TRACE2_EVENT=$(pwd)/testme git apply --cached ../patch-outside
$ grep ensure_full_index testme
{"event":"region_enter","sid":"20250510T030856.854763Z-H8ec63c79-P0000371c","thread":"main","time":"2025-05-10T03:08:56.856008Z","file":"sparse-index.c","line":372,"repo":1,"nesting":1,"category":"index","label":"ensure_full_index"}
{"event":"region_leave","sid":"20250510T030856.854763Z-H8ec63c79-P0000371c","thread":"main","time":"2025-05-10T03:08:56.856454Z","file":"sparse-index.c","line":455,"repo":1,"t_rel":0.000446,"nesting":1,"category":"index","label":"ensure_full_index"}
{"event":"region_enter","sid":"20250510T030856.854763Z-H8ec63c79-P0000371c","thread":"main","time":"2025-05-10T03:08:56.857016Z","file":"sparse-index.c","line":372,"repo":1,"nesting":1,"category":"index","label":"ensure_full_index"}
{"event":"region_leave","sid":"20250510T030856.854763Z-H8ec63c79-P0000371c","thread":"main","time":"2025-05-10T03:08:56.857135Z","file":"sparse-index.c","line":455,"repo":1,"t_rel":0.000119,"nesting":1,"category":"index","label":"ensure_full_index"}

So, indeed, `git apply --cached ../patch-outside` DOES expand the
index, as I expected.  It has to when folder2/ is a directory in the
index, so that we can get a folder2/a entry that we can modify.  And
that's just what we see:

$ git ls-files -s --sparse | grep folder2
040000 cb4007891397aa2a451037d1c69e57f0cf498c24 0    folder2/0/
100644 50f4ca4e6265a0497ec2ee6782648138914ad398 0    folder2/a


Can you add a `git sparse-checkout reapply` right after your `git
reset --hard`, and then switch the ensure_not_expanded to
ensure_expanded for the apply --cached call?





[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux