On Wed, Apr 30, 2025 at 09:58:13AM -0700, Junio C Hamano wrote: > Patrick Steinhardt <ps@xxxxxx> writes: > > > We're about to add another task for git-maintenance(1) that prunes stale > > rerere entries via `git rerere gc`. The condition of when to run this > > subcommand will be configurable so that the subcommand is only executed > > when a certain number of stale rerere entries exists. This requires us > > to know about the number of stale rerere entries in the first place, > > which is non-trivial to figure out. > > > > Refactor `rerere_gc()` and `prune_one()` so that garbage collection is > > split into three phases: > > > > 1. We collect any stale rerere entries and directories that are about > > to become empty. > > > > 2. Prune all stale rerere entries. > > > > 3. Remove all directories that should have become empty in (2). > > > > By splitting out the collection of stale entries we can trivially expose > > this function to external callers and thus reuse it in later steps. > > > > This refactoring is not expected to result in a user-visible change in > > behaviour. > > I have no objection against the goal of allowing "git maintenance" > drive "git rerere gc", and as the primary author of this code path I > do not see anything wrong, in the "correctness" sense, in the > updated code. > > I however am not sure if "count what we would prune, and remove only > when there are too many" would work well for this subsystem, because > I expect that the cost to enumerate existing rerere entries and > check each of them for staleness would be the dominant part, > relative to actual "rm -fr <rerere-id>", of the cost you are paying > when you run "git rerere gc". > > And if my suspicion is correct, all this change does to the plain > vanilla user of "git rerere gc" is to have them pay the extra cost > of allocating and deallocating the list of names of paths in string > lists. Yeah, I think this concern makes sense indeed. I was a bit sceptical myself whether this is going too far. Maybe a simpler solution would be to just count the number of directories in ".git/rr-cache", without checking whether those actually are prunable? We could also adapt this to be closer to the original version, where we only verified that ".git/rr-cache" exists and contains at least one subdirectory. This can even be combined with the above approach if we set "maintenance.rerere-gc.auto=1" by default. > We need to see some performance measurement to show that the we pay > for collection and counting is a lot smaller compared to the whole > pruning operation to justify the "auto" thing. Hm. I guess ultimately the answer is going to be "it depends". The performance implication on Windows is going to be quite different compared to the performance on Linux/macOS. In any case, let's go with the simpler model for now. We can still iterate as needed if we eventually see that the heuristic is too dumb to be useful. Patrick