[PATCH v2 0/8] sparse-checkout: add 'clean' command

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



NEW: This series is based on 2c5b5565981 (environment: remove the global
variable 'sparse_expect_files_outside_of_patterns', 2025-07-01) to build
upon those cleanups in builtin/sparse-checkout.c.

When using cone-mode sparse-checkout, users specify which tracked
directories they want (recursively) and any directory not part of the parent
paths for those directories are considered "out of scope". When changing
sparse-checkouts, there are a variety of reasons why these "out of scope"
directories could remain, including:

 * The user has .gitignore or .git/info/exclude files that tell Git to not
   remove files of a certain type.
 * Some filesystem blocker prevented the removal of a tracked file. This is
   usually more of an issue on Windows where a read handle will block file
   deletion.

Typically, this would not mean too much for the user experience. A few extra
filesystem checks might be required to satisfy git status commands, but the
scope of the performance hit is relative to how many cruft files are left
over in this situation.

However, when using the sparse index, these tracked sparse directories cause
significant performance issues. When noticing that the index contains a
sparse directory but that directory exists on disk, Git needs to expand that
sparse directory to determine which files are tracked or untracked. The
current mechanism expands the entire index to a full one, an expensive
operation that scales with the total number of paths at HEAD and not just
the number of cruft files left over.

Advice was added in 9479a31d603 (advice: warn when sparse index expands,
2024-07-08) to help users determine that they were in this state. However,
the advice doesn't actually recommend helpful ways to get out of this state.
Recommending "git clean" on its own is incomplete, as typically users
actually need 'git clean -dfx' to clear out the ignored or excluded files.
Even then, they may need 'git sparse-checkout reapply' afterwards to clear
the sparse directories.

The advice was successful in helping to alert users to the problem, which is
how I got wind of many of these cases for how users get into this state.
It's now time to give them a tool that helps them out of this state.

This series adds a new 'git sparse-checkout clean' command that currently
only works for cone-mode sparse-checkouts. The only thing it does is
collapse the index to a sparse index (as much as possible) and make sure
that any sparse directories are removed. These directories are listed to
stdout.

This command uses the same '--force' and '--dry-run' options as 'git clean',
with integrations with the 'clean.requireForce' config option. There are
some concerns that this isn't an obvious way to work with the 'git clean'
command, but I thought we should be consistent here. I did change the error
message to point users to the necessary options.

This option would be preferred to something like 'git clean -dfx' since it
does not clear the excluded files that are still within the sparse-checkout.
Instead, it performs the exact filesystem operations required to refresh the
sparse index performance back to what is expected.

I spent a few weeks debating with myself about whether or not this was the
right interface, so please suggest alternatives if you have better ideas.
Among my rejected ideas include:

 * 'git sparse-checkout reapply -f -x' or similar augmentations of
   'reapply'.
 * 'git clean --sparse' to focus the clean operation on things outside of
   the sparse-checkout.


Updates in V2
=============

 * This series is based on 2c5b5565981 (environment: remove the global
   variable 'sparse_expect_files_outside_of_patterns', 2025-07-01) to build
   upon those cleanups in builtin/sparse-checkout.c.
 * The --force and --dry-run options match 'git clean'.
 * A --verbose option is added. It does not link to the index for
   tracked/untracked/ignored/excluded or clean/modified/staged/conflicted
   status, but instead gives the full list for information.
 * To support the --verbose option, a new for_each_file_in_dir() method is
   added to dir.h.
 * Tests are added to demonstrate the behavior when a sparse directory has a
   merge conflict (fails with an explanation). When adding the test based on
   the previous version's functionality, I realized that the behavior is
   sometimes less effective than git sparse-checkout reapply even after a
   sparse file is committed. To demonstrate this change, the full test is
   created on its own and then a code change is added with the impact on the
   test.

Thanks, -Stolee

Derrick Stolee (8):
  sparse-checkout: remove use of the_repository
  sparse-checkout: add basics of 'clean' command
  sparse-checkout: match some 'clean' behavior
  dir: add generic "walk all files" helper
  sparse-checkout: add --verbose option to 'clean'
  sparse-index: point users to new 'clean' action
  t: expand tests around sparse merges and clean
  sparse-checkout: make 'clean' clear more files

 Documentation/git-sparse-checkout.adoc |  25 ++-
 builtin/sparse-checkout.c              | 230 ++++++++++++++++++-------
 dir.c                                  |  28 +++
 dir.h                                  |  14 ++
 sparse-index.c                         |   3 +-
 t/t1091-sparse-checkout-builtin.sh     | 130 ++++++++++++++
 unpack-trees.c                         |   2 +-
 7 files changed, 371 insertions(+), 61 deletions(-)


base-commit: 2c5b556598191ae64159dc998dc8f0917d412808
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1941%2Fderrickstolee%2Fgit-sparse-checkout-clean-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1941/derrickstolee/git-sparse-checkout-clean-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/1941

Range-diff vs v1:

 1:  3cdc44a9e8c ! 1:  92d0cd41a41 sparse-checkout: remove use of the_repository
     @@ builtin/sparse-checkout.c: static enum sparse_checkout_mode update_cone_mode(int
       	int mode, record_mode;
       
      @@ builtin/sparse-checkout.c: static int update_modes(int *cone_mode, int *sparse_index)
     - 	record_mode = (*cone_mode != -1) || !core_apply_sparse_checkout;
     + 	record_mode = (*cone_mode != -1) || !the_repository->settings.sparse_checkout;
       
       	mode = update_cone_mode(cone_mode);
      -	if (record_mode && set_config(mode))
     @@ builtin/sparse-checkout.c: static void add_patterns_literal(int argc, const char
       {
       	int result;
      @@ builtin/sparse-checkout.c: static int modify_pattern_list(struct strvec *args, int use_stdin,
     + 		break;
       	}
       
     - 	if (!core_apply_sparse_checkout) {
     +-	if (!the_repository->settings.sparse_checkout) {
      -		set_config(MODE_ALL_PATTERNS);
     +-		the_repository->settings.sparse_checkout = 1;
     ++	if (!repo->settings.sparse_checkout) {
      +		set_config(repo, MODE_ALL_PATTERNS);
     - 		core_apply_sparse_checkout = 1;
     ++		repo->settings.sparse_checkout = 1;
       		changed_config = 1;
       	}
       
     @@ builtin/sparse-checkout.c: static struct sparse_checkout_add_opts {
       	static struct option builtin_sparse_checkout_add_options[] = {
       		OPT_BOOL_F(0, "skip-checks", &add_opts.skip_checks,
      @@ builtin/sparse-checkout.c: static int sparse_checkout_add(int argc, const char **argv, const char *prefix,
     - 	if (!core_apply_sparse_checkout)
     + 	if (!the_repository->settings.sparse_checkout)
       		die(_("no sparse-checkout to add to"));
       
      -	repo_read_index(the_repository);
     @@ builtin/sparse-checkout.c: static int sparse_checkout_disable(int argc, const ch
       
       	add_pattern("/*", empty_base, 0, &pl, 0);
       
     --	prepare_repo_settings(the_repository);
      -	the_repository->settings.sparse_index = 0;
     -+	prepare_repo_settings(repo);
      +	repo->settings.sparse_index = 0;
       
      -	if (update_working_directory(&pl))
 2:  49418e8ec8a ! 2:  7e8f7c2d6c8 sparse-checkout: add 'clean' command
     @@ Metadata
      Author: Derrick Stolee <dstolee@xxxxxxxxxxxxx>
      
       ## Commit message ##
     -    sparse-checkout: add 'clean' command
     +    sparse-checkout: add basics of 'clean' command
      
          When users change their sparse-checkout definitions to add new
          directories and remove old ones, there may be a few reasons why
     @@ Commit message
          not be sufficient.
      
          Add a new subcommand to 'git sparse-checkout' that removes these
     -    tracked-but-sparse directories, including any excluded or ignored files
     -    underneath. This is the most extreme method for doing this, but it works
     +    tracked-but-sparse directories. This necessarily removes all files
     +    contained within, including tracked and untracked files. Of particular
     +    importance are ignored and excluded files which would normally be
     +    ignored even by 'git clean -f' unless the '-x' or '-X' option is
     +    provided. This is the most extreme method for doing this, but it works
          when the sparse-checkout is in cone mode and is expected to rescope
          based on directories, not files.
      
     -    Be sure to add a --dry-run option so users can predict what will be
     -    deleted. In general, output the directories that are being removed so
     -    users can know what was removed.
     +    The current implementation always deletes these sparse directories
     +    without warning. This is unacceptable for a released version, but those
     +    features will be added in changes coming immediately after this one.
      
     -    Note that untracked directories remain. Further, directories that
     -    contain staged changes are not deleted. This is a detail that is partly
     -    hidden by the implementation which relies on collapsing the index to a
     -    sparse index in-memory and only deleting directories that are listed as
     -    sparse in the index. If a staged change exists, then that entry is not
     -    stored as a sparse tree entry and thus remains on-disk until committed
     -    or reset.
     +    Note that untracked directories within the sparse-checkout remain.
     +    Further, directories that contain staged changes or files in merge
     +    conflict states are not deleted. This is a detail that is partly hidden
     +    by the implementation which relies on collapsing the index to a sparse
     +    index in-memory and only deleting directories that are listed as sparse
     +    in the index.
     +
     +    If a staged change exists, then that entry is not stored as a sparse
     +    tree entry and thus remains on-disk until committed or reset.
     +
     +    There are some interesting cases around merge conflict resolution, but
     +    that will be carefully analyzed in the future.
      
          Signed-off-by: Derrick Stolee <stolee@xxxxxxxxx>
      
     @@ Documentation/git-sparse-checkout.adoc: flags, with the same meaning as the flag
      +	sparse-checkout to be sure that we know which directories are
      +	both tracked and all contained paths are not in the sparse-checkout.
      +	This command can be used to be sure the sparse index works
     -+	efficiently.
     -++
     -+The `clean` command can also take the `--dry-run` (`-n`) option to list
     -+the directories it would remove without performing any filesystem changes.
     ++	efficiently, though it does not require enabling the sparse index
     ++  feature via the `index.sparse=true` configuration.
      +
       'disable'::
       	Disable the `core.sparseCheckout` config setting, and restore the
       	working directory to include all files.
      
       ## builtin/sparse-checkout.c ##
     +@@
     + #define DISABLE_SIGN_COMPARE_WARNINGS
     + 
     + #include "builtin.h"
     ++#include "abspath.h"
     + #include "config.h"
     + #include "dir.h"
     + #include "environment.h"
      @@
       static const char *empty_base = "";
       
     @@ builtin/sparse-checkout.c: static int sparse_checkout_reapply(int argc, const ch
      +	NULL
      +};
      +
     -+static struct sparse_checkout_clean_opts {
     -+	int dry_run;
     -+} clean_opts;
     ++static const char *msg_remove = N_("Removing %s\n");
      +
      +static int sparse_checkout_clean(int argc, const char **argv,
      +				   const char *prefix,
      +				   struct repository *repo)
      +{
      +	struct strbuf full_path = STRBUF_INIT;
     ++	const char *msg = msg_remove;
      +	size_t worktree_len;
     -+	static struct option builtin_sparse_checkout_clean_options[] = {
     -+		OPT_BOOL('n', "dry-run", &clean_opts.dry_run,
     -+			 N_("list the directories that would be removed without making filesystem changes")),
     ++
     ++	struct option builtin_sparse_checkout_clean_options[] = {
      +		OPT_END(),
      +	};
      +
      +	setup_work_tree();
     -+	if (!core_apply_sparse_checkout)
     ++	if (!repo->settings.sparse_checkout)
      +		die(_("must be in a sparse-checkout to clean directories"));
     -+	if (!core_sparse_checkout_cone)
     ++	if (!repo->settings.sparse_checkout_cone)
      +		die(_("must be in a cone-mode sparse-checkout to clean directories"));
      +
      +	argc = parse_options(argc, argv, prefix,
     @@ builtin/sparse-checkout.c: static int sparse_checkout_reapply(int argc, const ch
      +	if (repo_read_index(repo) < 0)
      +		die(_("failed to read index"));
      +
     -+	if (convert_to_sparse(repo->index, SPARSE_INDEX_MEMORY_ONLY))
     -+		die(_("failed to convert index to a sparse index"));
     ++	if (convert_to_sparse(repo->index, SPARSE_INDEX_MEMORY_ONLY) ||
     ++	    repo->index->sparse_index == INDEX_EXPANDED)
     ++		die(_("failed to convert index to a sparse index; resolve merge conflicts and try again"));
      +
      +	strbuf_addstr(&full_path, repo->worktree);
      +	strbuf_addch(&full_path, '/');
      +	worktree_len = full_path.len;
      +
      +	for (size_t i = 0; i < repo->index->cache_nr; i++) {
     -+		DIR* dir;
      +		struct cache_entry *ce = repo->index->cache[i];
      +		if (!S_ISSPARSEDIR(ce->ce_mode))
      +			continue;
      +		strbuf_setlen(&full_path, worktree_len);
      +		strbuf_add(&full_path, ce->name, ce->ce_namelen);
      +
     -+		dir = opendir(full_path.buf);
     -+		if (!dir)
     -+			continue;
     -+		else if (ENOENT != errno) {
     -+			warning_errno(_("failed to check for existence of '%s'"), ce->name);
     ++		if (!is_directory(full_path.buf))
      +			continue;
     -+		}
      +
     -+		closedir(dir);
     ++		printf(msg, ce->name);
      +
     -+		printf("%s\n", ce->name);
     -+		if (!clean_opts.dry_run) {
     -+			if (remove_dir_recursively(&full_path, 0))
     -+				warning_errno(_("failed to remove '%s'"), ce->name);
     -+		}
     ++		if (remove_dir_recursively(&full_path, 0))
     ++			warning_errno(_("failed to remove '%s'"), ce->name);
      +	}
      +
      +	strbuf_release(&full_path);
     @@ t/t1091-sparse-checkout-builtin.sh: test_expect_success 'check-rules null termin
      +	touch repo/folder1/file &&
      +
      +	cat >expect <<-\EOF &&
     -+	deep/deeper2/
     -+	folder1/
     ++	Removing deep/deeper2/
     ++	Removing folder1/
      +	EOF
      +
     -+	git -C repo sparse-checkout clean --dry-run >out &&
     -+	test_cmp expect out &&
     -+
     -+	test_path_exists repo/deep/deeper2 &&
     -+	test_path_exists repo/folder1 &&
     -+
      +	git -C repo sparse-checkout clean >out &&
      +	test_cmp expect out &&
      +
     -+	! test_path_exists repo/deep/deeper2 &&
     -+	! test_path_exists repo/folder1
     ++	test_path_is_missing repo/deep/deeper2 &&
     ++	test_path_is_missing repo/folder1
      +'
      +
      +test_expect_success 'clean with staged sparse change' '
      +	git -C repo sparse-checkout set --cone deep/deeper1 &&
     -+	mkdir repo/deep/deeper2 repo/folder1 &&
     ++	mkdir repo/deep/deeper2 repo/folder1 repo/folder2 &&
      +	touch repo/deep/deeper2/file &&
      +	touch repo/folder1/file &&
     ++	echo dirty >repo/folder2/a &&
      +
      +	git -C repo add --sparse folder1/file &&
      +
     ++	# deletes deep/deeper2/ but leaves folder1/ and folder2/
      +	cat >expect <<-\EOF &&
     -+	deep/deeper2/
     ++	Removing deep/deeper2/
      +	EOF
      +
     -+	git -C repo sparse-checkout clean --dry-run >out &&
     -+	test_cmp expect out &&
     -+
     -+	test_path_exists repo/deep/deeper2 &&
     -+	test_path_exists repo/folder1 &&
     -+
      +	git -C repo sparse-checkout clean >out &&
      +	test_cmp expect out &&
      +
     -+	! test_path_exists repo/deep/deeper2 &&
     ++	test_path_is_missing repo/deep/deeper2 &&
      +	test_path_exists repo/folder1
      +'
       
 -:  ----------- > 3:  221f3e5fb0c sparse-checkout: match some 'clean' behavior
 -:  ----------- > 4:  fd9a20a3922 dir: add generic "walk all files" helper
 -:  ----------- > 5:  f464bb5ed6b sparse-checkout: add --verbose option to 'clean'
 3:  80d7a7641da = 6:  d6dbc0b5ca9 sparse-index: point users to new 'clean' action
 -:  ----------- > 7:  0b1a2895b90 t: expand tests around sparse merges and clean
 -:  ----------- > 8:  82c24ce5198 sparse-checkout: make 'clean' clear more files

-- 
gitgitgadget




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux