Re: [PATCH 2/3] sparse-checkout: add 'clean' command

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jul 08, 2025 at 11:19:52AM +0000, Derrick Stolee via GitGitGadget wrote:
> diff --git a/Documentation/git-sparse-checkout.adoc b/Documentation/git-sparse-checkout.adoc
> index 529a8edd9c1e..21ba6f759905 100644
> --- a/Documentation/git-sparse-checkout.adoc
> +++ b/Documentation/git-sparse-checkout.adoc
> @@ -111,6 +111,17 @@ flags, with the same meaning as the flags from the `set` command, in order
>  to change which sparsity mode you are using without needing to also respecify
>  all sparsity paths.
>  
> +'clean'::
> +	Remove all files in tracked directories that are outside of the
> +	sparse-checkout definition. This subcommand requires cone-mode
> +	sparse-checkout to be sure that we know which directories are
> +	both tracked and all contained paths are not in the sparse-checkout.
> +	This command can be used to be sure the sparse index works
> +	efficiently.
> ++
> +The `clean` command can also take the `--dry-run` (`-n`) option to list
> +the directories it would remove without performing any filesystem changes.
> +

Hm. This is somewhat different from `git clean`, where you have to pass
`-f` to make it delete any data. I'm not particularly a fan of that
mode, but should we maybe retain it regardless to ensure that things are
at least a tiny bit more consistent?

> diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c
> index 8b70d0c6a441..6d2843827367 100644
> --- a/builtin/sparse-checkout.c
> +++ b/builtin/sparse-checkout.c
> @@ -924,6 +924,76 @@ static int sparse_checkout_reapply(int argc, const char **argv,
>  	return update_working_directory(repo, NULL);
>  }
>  
> +static char const * const builtin_sparse_checkout_clean_usage[] = {
> +	"git sparse-checkout clean [-n|--dry-run]",
> +	NULL
> +};
> +
> +static struct sparse_checkout_clean_opts {
> +	int dry_run;
> +} clean_opts;
> +
> +static int sparse_checkout_clean(int argc, const char **argv,
> +				   const char *prefix,
> +				   struct repository *repo)
> +{
> +	struct strbuf full_path = STRBUF_INIT;
> +	size_t worktree_len;
> +	static struct option builtin_sparse_checkout_clean_options[] = {
> +		OPT_BOOL('n', "dry-run", &clean_opts.dry_run,
> +			 N_("list the directories that would be removed without making filesystem changes")),
> +		OPT_END(),
> +	};
> +
> +	setup_work_tree();
> +	if (!core_apply_sparse_checkout)
> +		die(_("must be in a sparse-checkout to clean directories"));
> +	if (!core_sparse_checkout_cone)
> +		die(_("must be in a cone-mode sparse-checkout to clean directories"));
> +
> +	argc = parse_options(argc, argv, prefix,
> +			     builtin_sparse_checkout_clean_options,
> +			     builtin_sparse_checkout_clean_usage, 0);
> +
> +	if (repo_read_index(repo) < 0)
> +		die(_("failed to read index"));
> +
> +	if (convert_to_sparse(repo->index, SPARSE_INDEX_MEMORY_ONLY))
> +		die(_("failed to convert index to a sparse index"));

I noticed that there are several cases in `convert_to_sparse()` where we
simply do nothing. Should we check whether `repo->index->sparse_index`
matches `INDEX_COLLAPSED` after the operation?

> +	strbuf_addstr(&full_path, repo->worktree);
> +	strbuf_addch(&full_path, '/');
> +	worktree_len = full_path.len;
> +
> +	for (size_t i = 0; i < repo->index->cache_nr; i++) {
> +		DIR* dir;

Nit: the `*` goes with the variable, not the type.

> +		struct cache_entry *ce = repo->index->cache[i];
> +		if (!S_ISSPARSEDIR(ce->ce_mode))
> +			continue;

Okay, we only need to handle sparse directories.

> +		strbuf_setlen(&full_path, worktree_len);
> +		strbuf_add(&full_path, ce->name, ce->ce_namelen);
> +
> +		dir = opendir(full_path.buf);

Shouldn't it be sufficient to use `is_directory()`?

> +		if (!dir)
> +			continue;

This is the good and expected case, right? The entry is sparse, so
ideally it doesn't exist. If it does we have to recurse into to end up
with the full index.

> +		else if (ENOENT != errno) {

Nit: style. If one branches requires curly braces, all branches should
use them.

> +			warning_errno(_("failed to check for existence of '%s'"), ce->name);
> +			continue;
> +		}
> +
> +		closedir(dir);
> +
> +		printf("%s\n", ce->name);

git-clean(1) says "Removing %s\n". Should we do the same here?

> +		if (!clean_opts.dry_run) {
> +			if (remove_dir_recursively(&full_path, 0))
> +				warning_errno(_("failed to remove '%s'"), ce->name);
> +		}
> +	}
> +
> +	strbuf_release(&full_path);
> +	return 0;
> +}
> +
>  static char const * const builtin_sparse_checkout_disable_usage[] = {
>  	"git sparse-checkout disable",
>  	NULL

Patrick




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux