[no subject]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



And if neither of my guesses are what you meant by this sentence,
please do clue me in.

> >> This is a detail that is partly
> >> hidden by the implementation which relies on collapsing the index to a
> >> sparse index in-memory and only deleting directories that are listed as
> >> sparse in the index. If a staged change exists, then that entry is not
> >> stored as a sparse tree entry and thus remains on-disk until committed
> >> or reset.
> > In contrast, the files under the sparse directory with unstaged
> > changes would be problematic to simply remove.
> Except that a user is only using this command when they want
> files outside of the sparse-checkout to be deleted.
>
> I'd like to find the right way to make it clear to users who
> discover this command that they are asking for the following:
>
>   "I changed my sparse-checkout and some directories that I
>    expected to be deleted are still around. Delete them as I
>    don't care about them or the files inside anymore."
>
> Some of the discussion around having a --verbose option (in
> conjunction with --dry-run) would allow for the following
> user scenario:
>
>   "I changed my sparse-checkout and some directories that I
>    expected to be deleted are still around. Which files are
>    preventing that deletion? I'd like to know what's in the
>    way so I can evaluate if those files are important to me."

Yes, and this might even be a status-like output, showing whether the
files are untracked, ignored, tracked-and-unmodified, or
tracked-and-modified.

> >> +'clean'::
> >> +       Remove all files in tracked directories that are outside of the
> >> +       sparse-checkout definition.
> >
> > If literal, this sounds unsafe, particularly if run while resolving
> > merge or rebase conflicts (since those conflicts may occur in paths
> > outside the sparse checkout definition).
>
> If we are in a merge-conflict state, the directory is not
> collapsed in the sparse index..

Ah, good to know.

> >> +                                                    This subcommand requires cone-mode
> >> +       sparse-checkout to be sure that we know which directories are
> >> +       both tracked and all contained paths are not in the sparse-checkout.
> >> +       This command can be used to be sure the sparse index works
> >> +       efficiently.
> >
> > So...what does it do when in cone mode and the sparse index is not enabled?
>
> It doesn't effect the behavior, since we don't care about the on-disk
> format and instead use an in-memory sparse index to determine which
> directories to delete.
>
> There could be a benefit for users wanting to clean up extra files in
> their worktree even if they are not using a sparse index. It is less
> likely that they will discover that they are in that state if they
> are not pestered by the index expansion advice message.

Right, for cone mode without the sparse index turned on, this new
subcommand seems to be a silent no-op (other than burning some
computation time), despite the fact that the wording in the manual
might lead users to believe it will do some nice tidying for them.
When the command spends some time working but doesn't do anything and
doesn't report anything, the user might then think the command is just
buggy.  I think the command should probably either (a) do some tidying
for them, or (b) give them a warning that the command has no effect
when sparse index is not turned on.  Thoughts?

[...]
> >> +               printf("%s\n", ce->name);
> >> +               if (!clean_opts.dry_run) {
> >> +                       if (remove_dir_recursively(&full_path, 0))
> >> +                               warning_errno(_("failed to remove '%s'"), ce->name);
> >> +               }
> >
> > ...and then unconditionally remove the directory, as you stated in the
> > documentation for this clean option.
> >
> > I'm worried whether this is safe; if someone does a merge or rebase,
> > there could be tracked-and-modified/conflicted files outside the
> > sparse specification in the working tree.
>
> The conflicted files will not collapse to sparse directory entries.
>
> Does that ease your concern on that front?

Yes, that does ease my concerns...but it doesn't erase them.

If someone resolves the conflicted merge or rebase and commits (long
before running this `git sparse-checkout clean` command), what happens
to those paths?  Do these materialized paths persist in the worktree
after the commit?  I know they did at some point in our
implementation, and the current wording of `git sparse-checkout
reapply` in the manual certainly suggests such paths may stick around
until the user takes manual action:

           Reapply the sparsity pattern rules to paths in the working tree.
           Commands like merge or rebase can materialize paths to do their
           work (e.g. in order to show you a conflict), and other
           sparse-checkout commands might fail to sparsify an individual file
           (e.g. because it has unstaged changes or conflicts). In such cases,
           it can make sense to run git sparse-checkout reapply later after
           cleaning up affected paths (e.g. resolving conflicts, undoing or
           committing changes, etc.).

Now, if these paths stick around despite being outside the sparsity
specification, users may decide to modify them.  And if they have
modified them and run your command, won't you succeed in collapsing to
a sparse directory tree?  And then wouldn't this cause unstaged
changes to be discarded?  (I know this is a rare case, because users
would probably only merge or rebase changes they made while in their
sparse-checkout, and thus conflicts would generally be limited to the
sparse-checkout, but there are at least three ways I can think of that
conflicts could be triggered outside their sparse checkout, so I think
it's a realistic scenario that we should think through.)

Throwing away unstaged changes is something that is usually gated
behind a forcing flag (e.g. `git reset --hard` or `git checkout
--force`).  I know, we currently also gate removal of untracked files
in `git clean` behind a `--force` flag as well and I'm not concerned
with throwing away untracked files with your command without a forcing
flag, but I'm just wondering if we should be more careful with
unstaged changes than with untracked files.

Perhaps everyone is fine with also throwing away unstaged changes as
part of this command.  I could be convinced of that.  But if so, that
should at least be called out rather explicitly in the commit message,
the documentation, and the tests, whereas currently your patch is
silent on anything other than untracked and ignored files.

(Or, if my memory is out-of-date about materialized paths persisting
after their conflicts are resolved and a commit happens, then it'd
probably be worth calling that out in the commit message and perhaps
also updating some wording on the reapply subcommand.)

> > Even after resolving such a merge and committing, the paths may remain
> > around until the user does a 'git sparse-checkout reapply' (I don't
> > remember details here, but our documentation for reapply certainly
> > says so), and since the file might stick around, the user may make
> > further modifications to such a file.
> >
> > ...or will the convert_to_sparse() call above fail in all these cases?
> >  If it does, should it give a better and more useful error message
> > than "failed to convert index to a sparse index" and rather e.g. "path
> > %s has modifications; please stage or revert first"?
>
> It won't fail. It just won't collapse as far.

Oh!  Based on this hint, I went and looked up the code for this; it's
from convert_to_sparse_rec(), right?  I see something interesting
there; does the present-despite-skipped checks (from 82386b44963f
(Merge branch 'en/present-despite-skipped', 2022-03-09)) cause this
collapsing to also fail for unstaged entries?  I.e. this part of
convert_to_sparse_rec():

                if (ce_stage(ce) ||
                    S_ISGITLINK(ce->ce_mode) ||
                    !(ce->ce_flags & CE_SKIP_WORKTREE))
                        can_convert = 0;

The `ce_stage(ce)` part of it is what prevent it from collapsing when
there are conflicts, and I think the `!(ce->ce_flags &
CE_SKIP_WORKTREE))` would prevent it from collapsing any tracked files
whatsoever, whether modified or not, due to the
present-despite-skipped checks.  Does that sound right?

In other words, perhaps your clean command as implemented really does
only handle untracked and ignored files, and if the user also has
tracked-but-unmodified or tracked-with-unstaged-changes or
tracked-with-staged-changes then this command won't actually restore
performance for them until they _also_ run `git sparse-checkout
reapply` ?

> You do make a good point that there could be extra help messages to say
> that there are uncollapsed directories (detectable by seeing a blob path
> with the skip-worktree bit on, maybe). I will think on this.

In order to detect this by the skip-worktree bit, you'd probably need
to run it as part of the present-despite-skipped checks mentioned
above (or run it before those checks clear the skip-worktree bit for
present files).





[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux