Re: [PATCH v2 6/8] rerere: provide function to collect stale entries

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Apr 30, 2025 at 09:58:13AM -0700, Junio C Hamano wrote:
> Patrick Steinhardt <ps@xxxxxx> writes:
> 
> > We're about to add another task for git-maintenance(1) that prunes stale
> > rerere entries via `git rerere gc`. The condition of when to run this
> > subcommand will be configurable so that the subcommand is only executed
> > when a certain number of stale rerere entries exists. This requires us
> > to know about the number of stale rerere entries in the first place,
> > which is non-trivial to figure out.
> >
> > Refactor `rerere_gc()` and `prune_one()` so that garbage collection is
> > split into three phases:
> >
> >   1. We collect any stale rerere entries and directories that are about
> >      to become empty.
> >
> >   2. Prune all stale rerere entries.
> >
> >   3. Remove all directories that should have become empty in (2).
> >
> > By splitting out the collection of stale entries we can trivially expose
> > this function to external callers and thus reuse it in later steps.
> >
> > This refactoring is not expected to result in a user-visible change in
> > behaviour.
> 
> I have no objection against the goal of allowing "git maintenance"
> drive "git rerere gc", and as the primary author of this code path I
> do not see anything wrong, in the "correctness" sense, in the
> updated code.
> 
> I however am not sure if "count what we would prune, and remove only
> when there are too many" would work well for this subsystem, because
> I expect that the cost to enumerate existing rerere entries and
> check each of them for staleness would be the dominant part,
> relative to actual "rm -fr <rerere-id>", of the cost you are paying
> when you run "git rerere gc".
> 
> And if my suspicion is correct, all this change does to the plain
> vanilla user of "git rerere gc" is to have them pay the extra cost
> of allocating and deallocating the list of names of paths in string
> lists.

Yeah, I think this concern makes sense indeed. I was a bit sceptical
myself whether this is going too far. Maybe a simpler solution would be
to just count the number of directories in ".git/rr-cache", without
checking whether those actually are prunable?

We could also adapt this to be closer to the original version, where we
only verified that ".git/rr-cache" exists and contains at least one
subdirectory. This can even be combined with the above approach if we
set "maintenance.rerere-gc.auto=1" by default.

> We need to see some performance measurement to show that the we pay
> for collection and counting is a lot smaller compared to the whole
> pruning operation to justify the "auto" thing.

Hm. I guess ultimately the answer is going to be "it depends". The
performance implication on Windows is going to be quite different
compared to the performance on Linux/macOS.

In any case, let's go with the simpler model for now. We can still
iterate as needed if we eventually see that the heuristic is too dumb to
be useful.

Patrick




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux