Re: [PATCH] commit: avoid parent list buildup in clear_commit_marks_many()

Patrick Steinhardt <ps@xxxxxx> · Mon, 17 Feb 2025 07:33:07 +0100

On Thu, Feb 13, 2025 at 10:38:51PM +0100, René Scharfe wrote:
> Am 12.02.25 um 08:05 schrieb Patrick Steinhardt:
> > And this is fixed by immediately processing all commits that we
> > currently have in the list. As `clear_commit_marks_1()` only processes
> > immediate children of the handed-in commit we know that it will have
> > processed the first parent, and `list` will contain remaining commits,
> > if any.
> 
> clear_commit_marks_1() processes the whole ancestral chain of first
> parents down to the root or first clean ancestor.
> 
> > We also end up adding grandchildren to the list, so this change
> > essentially converts the algorithm from depth-first to breadth-first.
> 
> It's still depth-first, but all second and higher numbered parents added
> to the list are cleaned before starting to clean the next commit from
> the input list.  So we go from "clean straight down for each input
> commit first and clean their side branches later" to "clean all
> ancestors for each input commit".

Ah, fair.

> > I bet we can construct cases where this will perform _worse_ than the
> > current algorithm, e.g. when you have branch thickets where every commit
> > is a merge: But I assume that for the most common cases this might be an
> > improvement indeed.
> 
> I won't bet, but I'd like to see such a case.  Can't imagine one.
> 
> > The question to me is: does this actually matter in the real world? It
> > would be nice to maybe get some numbers that demonstrate the improvement
> > in a repository.
> 
> Well, the maximum list length for clear_commit_marks_many() calls with
> nr > 1 in the test suite goes from 12 in t6600 to 4 with the patch.  Not
> that exciting.  The question to me is: Why pile up parents in the list
> when we can clean them earlier with no downside?  Or is there any?

If it really is without downsides then yes, it's a nice improvement. I
was mostly wondering whether you're fixing something where the old way
of doing things performs _significantly_ worse and where the change
could lead to a user-visible improvement. Like, requiring us to store
orders of magnitudes less commits at the same time.

Patrick