Re: [PATCH v2 2/8] sparse-checkout: add basics of 'clean' command

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jul 16, 2025 at 6:34 PM Derrick Stolee via GitGitGadget
<gitgitgadget@xxxxxxxxx> wrote:

Sorry for the long delay in responding...

[...]
> Add a new subcommand to 'git sparse-checkout' that removes these
> tracked-but-sparse directories. This necessarily removes all files
> contained within, including tracked and untracked files. Of particular

Nice to see tracked files also being addressed in v2.

> importance are ignored and excluded files which would normally be
> ignored even by 'git clean -f' unless the '-x' or '-X' option is
> provided. This is the most extreme method for doing this, but it works
> when the sparse-checkout is in cone mode and is expected to rescope
> based on directories, not files.
>
> The current implementation always deletes these sparse directories
> without warning. This is unacceptable for a released version, but those
> features will be added in changes coming immediately after this one.
>
> Note that untracked directories within the sparse-checkout remain.

You've changed the wording here relative to v1, but you haven't
addressed the part that was ambiguous/misleading in v1.  In fact, you
may have made a different part ambiguous as well, and made readers
think that this sentence contradicts your above claims that this
command is meant to clean out untracked directories underneath sparse
directories.  Perhaps something like:

"Note that untracked directories in the sparse-checkout that are not
within sparse directories will not be removed by this command; it only
cleans up paths under directories that are supposed to be sparse."

> Further, directories that contain staged changes or files in merge
> conflict states are not deleted.

Doesn't this sentence conflict with your above statement that "This
necessarily removes all files contained within, including tracked and
untracked files."?

> This is a detail that is partly hidden
> by the implementation which relies on collapsing the index to a sparse
> index in-memory and only deleting directories that are listed as sparse
> in the index.
>
> If a staged change exists, then that entry is not stored as a sparse
> tree entry and thus remains on-disk until committed or reset.

This seems a bit surprising -- if a file's modifications are staged,
then you can use it and other files in the index to write new trees
all the way up to the toplevel, so you should be able to get a sparse
tree entry without problem.  I'd only expect problems if you had
unstaged changes, or higher order stages; perhaps you could clarify
here?

> There are some interesting cases around merge conflict resolution, but
> that will be carefully analyzed in the future.

...okay, so you did clarify for the higher order stages, but I'm still
confused about staged vs unstaged without conflicts.

>
> Signed-off-by: Derrick Stolee <stolee@xxxxxxxxx>
> ---
>  Documentation/git-sparse-checkout.adoc | 11 ++++-
>  builtin/sparse-checkout.c              | 64 +++++++++++++++++++++++++-
>  t/t1091-sparse-checkout-builtin.sh     | 38 +++++++++++++++
>  3 files changed, 111 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/git-sparse-checkout.adoc b/Documentation/git-sparse-checkout.adoc
> index 529a8edd9c1e..6db88f00781d 100644
> --- a/Documentation/git-sparse-checkout.adoc
> +++ b/Documentation/git-sparse-checkout.adoc
> @@ -9,7 +9,7 @@ git-sparse-checkout - Reduce your working tree to a subset of tracked files
>  SYNOPSIS
>  --------
>  [verse]
> -'git sparse-checkout' (init | list | set | add | reapply | disable | check-rules) [<options>]
> +'git sparse-checkout' (init | list | set | add | reapply | disable | check-rules | clean) [<options>]
>
>
>  DESCRIPTION
> @@ -111,6 +111,15 @@ flags, with the same meaning as the flags from the `set` command, in order
>  to change which sparsity mode you are using without needing to also respecify
>  all sparsity paths.
>
> +'clean'::
> +       Remove all files in tracked directories that are outside of the
> +       sparse-checkout definition. This subcommand requires cone-mode

So, this sentence implies that it'll wipe out all untracked or ignored
files or tracked files with either unstaged, staged, or conflicted
entries.  Your commit message says it'll discuss conflicted entries
later, but conflicts about whether staged or unstaged changes will be
wiped.

> +       sparse-checkout to be sure that we know which directories are
> +       both tracked and all contained paths are not in the sparse-checkout.
> +       This command can be used to be sure the sparse index works
> +       efficiently, though it does not require enabling the sparse index

[...]
> diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c
[...]
> +       if (convert_to_sparse(repo->index, SPARSE_INDEX_MEMORY_ONLY) ||
> +           repo->index->sparse_index == INDEX_EXPANDED)
> +               die(_("failed to convert index to a sparse index; resolve merge conflicts and try again"));

In the commit message, though, you said that it'd also fail to convert
a tree with a staged change to a sparse directory.  And I thought in
our discussion on v1 we found out that unstaged changes would prevent
converting to sparse.  Shouldn't the error message be more general,
then?

[...]
> diff --git a/t/t1091-sparse-checkout-builtin.sh b/t/t1091-sparse-checkout-builtin.sh
> index ab3a105ffff2..a48eedf766d2 100755
> --- a/t/t1091-sparse-checkout-builtin.sh
> +++ b/t/t1091-sparse-checkout-builtin.sh
> @@ -1050,5 +1050,43 @@ test_expect_success 'check-rules null termination' '
>         test_cmp expect actual
>  '
>
> +test_expect_success 'clean' '
> +       git -C repo sparse-checkout set --cone deep/deeper1 &&
> +       mkdir repo/deep/deeper2 repo/folder1 &&
> +       touch repo/deep/deeper2/file &&
> +       touch repo/folder1/file &&
> +
> +       cat >expect <<-\EOF &&
> +       Removing deep/deeper2/
> +       Removing folder1/
> +       EOF
> +
> +       git -C repo sparse-checkout clean >out &&
> +       test_cmp expect out &&
> +
> +       test_path_is_missing repo/deep/deeper2 &&
> +       test_path_is_missing repo/folder1
> +'
> +
> +test_expect_success 'clean with staged sparse change' '
> +       git -C repo sparse-checkout set --cone deep/deeper1 &&
> +       mkdir repo/deep/deeper2 repo/folder1 repo/folder2 &&
> +       touch repo/deep/deeper2/file &&
> +       touch repo/folder1/file &&
> +       echo dirty >repo/folder2/a &&
> +
> +       git -C repo add --sparse folder1/file &&
> +
> +       # deletes deep/deeper2/ but leaves folder1/ and folder2/
> +       cat >expect <<-\EOF &&
> +       Removing deep/deeper2/
> +       EOF
> +
> +       git -C repo sparse-checkout clean >out &&
> +       test_cmp expect out &&
> +
> +       test_path_is_missing repo/deep/deeper2 &&
> +       test_path_exists repo/folder1

What about repo/folder2/ ?

Anyway, this test shows that neither staged nor unstaged changes are
cleaned up (which at least resolves the conflicting documentation you
provided on the matter) -- or would if you also checked repo/folder2.

What it doesn't show is that tracked files with neither staged nor
unstaged changes are not cleaned up either:

$ mkdir repo/folder2
$ echo dirty >repo/folder2/a
$ touch repo/folder2/untracked
$ cd repo
$ git status --porcelain
 M folder2/a
?? folder2/untracked

# So, we have both a unstaged change and an untracked file; let's undo
the unstaged change

$ git checkout HEAD folder2/a
Updated 1 path from 8cc814f
$ git status --porcelain
?? folder2/untracked
$ ls folder2/
a  untracked

# Both files are still present -- the untracked file, and the
untracked file with no changes either staged or unstaged -- what does
`git sparse-checkout clean` do?

$ git sparse-checkout clean
$ ls folder2/
a  untracked
$ git status --porcelain
?? folder2/untracked

# Absolutely nothing.  Not only does it not clean anything up, it
gives no warnings about not cleaning up what should be cleaned up.
Let's try sparse-checkout reapply:

$ git sparse-checkout reapply
warning: directory 'folder2/' contains untracked files, but is not in
the sparse-checkout cone
$ git status --porcelain
?? folder2/untracked
$ ls folder2/
untracked

# So `sparse-checkout reapply` does correctly remove folder2/a for us,
while warning about the untracked file.  (If folder2/a would have
still had changes, it would have warned about it instead of
removing.).  Let's try `sparse-checkout clean` now...

$ git sparse-checkout clean
Removing folder2/
$ git status --porcelain
$ ls folder2/
ls: cannot access 'folder2/': No such file or directory
$

I think these cases either need to be a new testcase or part of this
last testcase, and the commit message and documentation should be
clearer about tracked-and-staged, tracked-with-unstaged-changes, and
tracked-with-no-changes files...or at least comment that they'll be
discussed later in the patch series.  (I have a feeling I just did a
lot of work to discover as I read your next patches that you cover
these later...)





[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux