When using cone-mode sparse-checkout, users specify which tracked directories they want (recursively) and any directory not part of the parent paths for those directories are considered "out of scope". When changing sparse-checkouts, there are a variety of reasons why these "out of scope" directories could remain, including: * The user has .gitignore or .git/info/exclude files that tell Git to not remove files of a certain type. * Some filesystem blocker prevented the removal of a tracked file. This is usually more of an issue on Windows where a read handle will block file deletion. Typically, this would not mean too much for the user experience. A few extra filesystem checks might be required to satisfy git status commands, but the scope of the performance hit is relative to how many cruft files are left over in this situation. However, when using the sparse index, these tracked sparse directories cause significant performance issues. When noticing that the index contains a sparse directory but that directory exists on disk, Git needs to expand that sparse directory to determine which files are tracked or untracked. The current mechanism expands the entire index to a full one, an expensive operation that scales with the total number of paths at HEAD and not just the number of cruft files left over. Advice was added in 9479a31d603 (advice: warn when sparse index expands, 2024-07-08) to help users determine that they were in this state. However, the advice doesn't actually recommend helpful ways to get out of this state. Recommending "git clean" on its own is incomplete, as typically users actually need 'git clean -dfx' to clear out the ignored or excluded files. Even then, they may need 'git sparse-checkout reapply' afterwards to clear the sparse directories. The advice was successful in helping to alert users to the problem, which is how I got wind of many of these cases for how users get into this state. It's now time to give them a tool that helps them out of this state. This series adds a new 'git sparse-checkout clean' command that currently only works for cone-mode sparse-checkouts. The only thing it does is collapse the index to a sparse index (as much as possible) and make sure that any sparse directories are removed. These directories are listed to stdout. A --dry-run option is available to list the directories that would be removed without actually deleting the directories. This option would be preferred to something like 'git clean -dfx' since it does not clear the excluded files that are still within the sparse-checkout. Instead, it performs the exact filesystem operations required to refresh the sparse index performance back to what is expected. I spent a few weeks debating with myself about whether or not this was the right interface, so please suggest alternatives if you have better ideas. Among my rejected ideas include: * 'git sparse-checkout reapply -f -x' or similar augmentations of 'reapply'. * 'git clean --sparse' to focus the clean operation on things outside of the sparse-checkout. The implementation is rather simple with the current CLI. Future augmentations could include a --quiet option to silence the output and a --verbose option to list the files that exist within each directory and would/will be removed. Thanks, -Stolee Derrick Stolee (3): sparse-checkout: remove use of the_repository sparse-checkout: add 'clean' command sparse-index: point users to new 'clean' action Documentation/git-sparse-checkout.adoc | 13 +- builtin/sparse-checkout.c | 192 +++++++++++++++++-------- sparse-index.c | 3 +- t/t1091-sparse-checkout-builtin.sh | 48 +++++++ 4 files changed, 197 insertions(+), 59 deletions(-) base-commit: 8b6f19ccfc3aefbd0f22f6b7d56ad6a3fc5e4f37 Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1941%2Fderrickstolee%2Fgit-sparse-checkout-clean-v1 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1941/derrickstolee/git-sparse-checkout-clean-v1 Pull-Request: https://github.com/gitgitgadget/git/pull/1941 -- gitgitgadget