Re: [PATCH] ls-files: conditionally leave index sparse

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Aug 15, 2025 at 9:13 AM Derrick Stolee via GitGitGadget
<gitgitgadget@xxxxxxxxx> wrote:
>
> From: Derrick Stolee <stolee@xxxxxxxxx>
>
> When running 'git ls-files' with a pathspec, the index entries get
> filtered according to that pathspec before iterating over them in

When I first read this patch, I missed this part of your commit
message and figured there was no possible way your patch could
actually speed things up.  I verified with your testcase that it
worked, though, and had to step through a debugger to find out what I
was missing.  It's the prune_index() call in cmd_ls_files() that does
this -- but only when the pathspecs provided have some common prefix.
So, it's not unique to when there's a single pathspec as your commit
message claims, and the pointer to prune_index() may have helped save
me some head-scratching in review the patch.

Perhaps this could be clarified here (and made more explicit for folks
like me that gloss over it), something like

When running 'git ls-files' with pathspecs with a common prefix, the
index entries get
filtered according to that common prefix in prune_index() before
iterating over them in show_files().

> show_files().  In 78087097b8 (ls-files: add --sparse option,
> 2021-12-22), this iteration was prefixed with a check for the '--sparse'
> option which allows the command to output directory entries; this
> created a pre-loop call to ensure_full_index().
>
> However, when a user runs 'git ls-files' where the pathspec matches
> directories that are recursively matched in the sparse-checkout, there
> are not any sparse directories that match the pathspec so they would not
> be written to the output. The expansion in this case is just a
> performance drop for no behavior difference.
>
> Replace this global check to expand the index with a check inside the
> loop for a matched sparse directory. If we see one, then expand the
> index and continue from the current location. This is safe since the
> previous entries in the index did not have any sparse directories and
> thus would remain stable in this expansion.
>
> A test in t1092 confirms that this changes the behavior.
>
> Signed-off-by: Derrick Stolee <stolee@xxxxxxxxx>
> ---
>     ls-files: conditionally leave index sparse
>
>     Here's a small sparse index performance update based on a user report.
>
>     Thanks, -Stolee
>
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1955%2Fderrickstolee%2Fls-files-sparse-index-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1955/derrickstolee/ls-files-sparse-index-v1
> Pull-Request: https://github.com/gitgitgadget/git/pull/1955
>
>  builtin/ls-files.c                       | 13 ++++++++++---
>  t/t1092-sparse-checkout-compatibility.sh | 13 +++++++++++++
>  2 files changed, 23 insertions(+), 3 deletions(-)
>
> diff --git a/builtin/ls-files.c b/builtin/ls-files.c
> index c06a6f33e41..b148607f7a1 100644
> --- a/builtin/ls-files.c
> +++ b/builtin/ls-files.c
> @@ -414,14 +414,21 @@ static void show_files(struct repository *repo, struct dir_struct *dir)
>         if (!(show_cached || show_stage || show_deleted || show_modified))
>                 return;
>
> -       if (!show_sparse_dirs)
> -               ensure_full_index(repo->index);
> -
>         for (i = 0; i < repo->index->cache_nr; i++) {
>                 const struct cache_entry *ce = repo->index->cache[i];
>                 struct stat st;
>                 int stat_err;
>
> +               if (S_ISSPARSEDIR(ce->ce_mode) && !show_sparse_dirs) {
> +                       /*
> +                        * This is the first time we've hit a sparse dir,
> +                        * so expansion will leave the first 'i' entries
> +                        * alone.
> +                        */
> +                       ensure_full_index(repo->index);
> +                       ce = repo->index->cache[i];
> +               }

I see how this is safe.  I didn't understand how it helped performance
until I figured out by stepping through that repo->indexc->cache_nr is
much less than I expected, because of the prune_index() call that
happened earlier.

>                 construct_fullname(&fullname, repo, ce);
>
>                 if ((dir->flags & DIR_SHOW_IGNORED) &&
> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> index d8101139b40..b0f691c151a 100755
> --- a/t/t1092-sparse-checkout-compatibility.sh
> +++ b/t/t1092-sparse-checkout-compatibility.sh
> @@ -1506,6 +1506,8 @@ test_expect_success 'sparse-index is not expanded' '
>         ensure_not_expanded reset --hard &&
>         ensure_not_expanded restore -s rename-out-to-out -- deep/deeper1 &&
>
> +       ensure_not_expanded ls-files deep/deeper1 &&
> +

Thanks, this testcase is exactly what I needed to figure out what I
was misunderstanding.

>         echo >>sparse-index/README.md &&
>         ensure_not_expanded add -A &&
>         echo >>sparse-index/extra.txt &&
> @@ -1607,6 +1609,17 @@ test_expect_success 'describe tested on all' '
>         test_all_match git describe --dirty
>  '
>
> +test_expect_success 'ls-files filtering and expansion' '
> +       init_repos &&
> +
> +       # This filtering will hit a sparse directory midway
> +       # through the iteration.
> +       test_all_match git ls-files deep &&
> +
> +       # This pathspec will filter the index to only a sparse
> +       # directory.
> +       test_all_match git ls-files folder1
> +'

Looks good.





[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux