[PATCH 0/3] sparse-checkout: add 'clean' command

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



When using cone-mode sparse-checkout, users specify which tracked
directories they want (recursively) and any directory not part of the parent
paths for those directories are considered "out of scope". When changing
sparse-checkouts, there are a variety of reasons why these "out of scope"
directories could remain, including:

 * The user has .gitignore or .git/info/exclude files that tell Git to not
   remove files of a certain type.
 * Some filesystem blocker prevented the removal of a tracked file. This is
   usually more of an issue on Windows where a read handle will block file
   deletion.

Typically, this would not mean too much for the user experience. A few extra
filesystem checks might be required to satisfy git status commands, but the
scope of the performance hit is relative to how many cruft files are left
over in this situation.

However, when using the sparse index, these tracked sparse directories cause
significant performance issues. When noticing that the index contains a
sparse directory but that directory exists on disk, Git needs to expand that
sparse directory to determine which files are tracked or untracked. The
current mechanism expands the entire index to a full one, an expensive
operation that scales with the total number of paths at HEAD and not just
the number of cruft files left over.

Advice was added in 9479a31d603 (advice: warn when sparse index expands,
2024-07-08) to help users determine that they were in this state. However,
the advice doesn't actually recommend helpful ways to get out of this state.
Recommending "git clean" on its own is incomplete, as typically users
actually need 'git clean -dfx' to clear out the ignored or excluded files.
Even then, they may need 'git sparse-checkout reapply' afterwards to clear
the sparse directories.

The advice was successful in helping to alert users to the problem, which is
how I got wind of many of these cases for how users get into this state.
It's now time to give them a tool that helps them out of this state.

This series adds a new 'git sparse-checkout clean' command that currently
only works for cone-mode sparse-checkouts. The only thing it does is
collapse the index to a sparse index (as much as possible) and make sure
that any sparse directories are removed. These directories are listed to
stdout.

A --dry-run option is available to list the directories that would be
removed without actually deleting the directories.

This option would be preferred to something like 'git clean -dfx' since it
does not clear the excluded files that are still within the sparse-checkout.
Instead, it performs the exact filesystem operations required to refresh the
sparse index performance back to what is expected.

I spent a few weeks debating with myself about whether or not this was the
right interface, so please suggest alternatives if you have better ideas.
Among my rejected ideas include:

 * 'git sparse-checkout reapply -f -x' or similar augmentations of
   'reapply'.
 * 'git clean --sparse' to focus the clean operation on things outside of
   the sparse-checkout.

The implementation is rather simple with the current CLI. Future
augmentations could include a --quiet option to silence the output and a
--verbose option to list the files that exist within each directory and
would/will be removed.

Thanks, -Stolee

Derrick Stolee (3):
  sparse-checkout: remove use of the_repository
  sparse-checkout: add 'clean' command
  sparse-index: point users to new 'clean' action

 Documentation/git-sparse-checkout.adoc |  13 +-
 builtin/sparse-checkout.c              | 192 +++++++++++++++++--------
 sparse-index.c                         |   3 +-
 t/t1091-sparse-checkout-builtin.sh     |  48 +++++++
 4 files changed, 197 insertions(+), 59 deletions(-)


base-commit: 8b6f19ccfc3aefbd0f22f6b7d56ad6a3fc5e4f37
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1941%2Fderrickstolee%2Fgit-sparse-checkout-clean-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1941/derrickstolee/git-sparse-checkout-clean-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1941
-- 
gitgitgadget




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux