Re: [PATCH 2/3] sparse-checkout: add 'clean' command

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 7/8/2025 5:20 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes:
> 
>> From: Derrick Stolee <stolee@xxxxxxxxx>
>>
>> When users change their sparse-checkout definitions to add new
>> directories and remove old ones, there may be a few reasons why
>> directories no longer in scope remain (ignored or excluded files still
>> exist, Windows handles are still open, etc.). When these files still
>> exist, the sparse index feature notices that a tracked, but sparse,
>> directory still exists on disk and thus the index expands. This causes a
>> performance hit _and_ the advice printed isn't very helpful. Using 'git
>> clean' isn't enough (generally '-dfx' may be needed) but also this may
>> not be sufficient.
>>
>> Add a new subcommand to 'git sparse-checkout' that removes these
>> tracked-but-sparse directories, including any excluded or ignored files
> 
> Are excluded files and ignored files form two separate sets, or are
> they one and the same?  Do files that users forgot to add (e.g. new
> source file that would not match any patterns listed in .gitignore)
> and object files left over from the previous compilation (most
> likely match *.o in .gitignore) treated the same way for the purpose
> of determining if the directory that is no longer in the cone can be
> removed?

I think of them as separate in my head because:

* .gitignore is committed to the repo, and is common to all users of
  the repo.

* .git/info/exclude is custom to each user, so users are choosing to
  ignore extra files that are atypical from most users.

In the monorepo I'm thinking about, .gitignore files are rather small
because all build output has already been redirected out of the
worktree for performance reasons. Thus, _most_ users don't have this
problem. However, some users add extra excludes for things like vim
files and those get leftover, causing invisible (to 'git status') pain.

>> underneath. This is the most extreme method for doing this, but it works
>> when the sparse-checkout is in cone mode and is expected to rescope
>> based on directories, not files.
>>
>> Be sure to add a --dry-run option so users can predict what will be
>> deleted. In general, output the directories that are being removed so
>> users can know what was removed.
> 
> Hmph.  It would be safer to show not just the directories but which
> excluded files are about to be lost, wouldn't it, especially when
> the user is trying to play safe and see what potential damage they
> are looking at?
> > Also even though ignored files are "ignored and expendable", nobody
> marks their temporary file as "ignored but precious" (yet), so "it
> is listed in .gitignore so we can safely remove it" may not be a
> safe assumption for us to be making (yet).  Shouldn't we at least be
> listing these ignored files in --dry-run output, next to those files
> that the user may have forgotten to add?

I considered this, but mostly behind a potential --verbose option to
list the files that are leftover. Much of the design here is that
these _directories_ are out of scope, skipping over any details about
the contained files, so I thought this directory-based output would
communicate enough information.

A curious user may want to know "why are these directories still
around?" and the more verbose output would assist.

>> Note that untracked directories remain. Further, directories that
>> contain staged changes are not deleted. This is a detail that is partly
>> hidden by the implementation which relies on collapsing the index to a
>> sparse index in-memory and only deleting directories that are listed as
>> sparse in the index. If a staged change exists, then that entry is not
>> stored as a sparse tree entry and thus remains on-disk until committed
>> or reset.
> 
> Removing untracked directories is a job for "clean -d", so it makes
> sense for this new command not to touch them.  Not losing changes
> that have already been added is just a bad as losing new files that
> the user forgot to add, so it does make sense not to remove them.
> 
> I wonder if we need "-x" and/or "-X" options "clean" has (and
> perhaps "-d" that is a no-op, as the whole point of this subcommand
> is about removing directories from the working tree) to control its
> operation a bit finer-grained way.
I'm of two minds here.

My first inclination is "we already have 'git clean' for fine-grained
control of removing ignored/excluded files".

My second inclination is "'git clean' would remove these ignored files
even when they are within the sparse-checkout, so that's too big of a
hammer".

There are a lot of ways to filter the files that would be removed,
but I think that in this case most users are wanting a one-command way
to get their sparse-checkout into a better state.

I'm not making any final statements here. I appreciate all of the
thoughts around which options should be default and which should be
hidden behind options.

Thanks,
-Stolee





[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux