On Sun, Feb 09, 2025 at 11:41:15AM +0100, René Scharfe wrote: > clear_commit_marks_1() clears the marks of the first parent and its > first parent and so on, and saves the higher numbered parents in a list > for later. There is no benefit in keeping that list growing with each > handled commit. Clear it after each run to reduce peak memory usage. Okay. So the issue here is that `clear_commit_marks_1()` only processes the first-parent chain of commits, and any other commits will be added to the `struct commit_list` backlog. Consequently, merge-heavy histories are very likely to build up quite a backlog of non-first-parent commits. > Signed-off-by: René Scharfe <l.s.r@xxxxxx> > --- > commit.c | 8 ++++---- > 1 file changed, 4 insertions(+), 4 deletions(-) > > diff --git a/commit.c b/commit.c > index 540660359d..6efdb03997 100644 > --- a/commit.c > +++ b/commit.c > @@ -780,14 +780,14 @@ static void clear_commit_marks_1(struct commit_list **plist, > > void clear_commit_marks_many(size_t nr, struct commit **commit, unsigned int mark) > { > - struct commit_list *list = NULL; > - > for (size_t i = 0; i < nr; i++) { > + struct commit_list *list = NULL; > + > clear_commit_marks_1(&list, *commit, mark); > + while (list) > + clear_commit_marks_1(&list, pop_commit(&list), mark); > commit++; > } > - while (list) > - clear_commit_marks_1(&list, pop_commit(&list), mark); > } And this is fixed by immediately processing all commits that we currently have in the list. As `clear_commit_marks_1()` only processes immediate children of the handed-in commit we know that it will have processed the first parent, and `list` will contain remaining commits, if any. We also end up adding grandchildren to the list, so this change essentially converts the algorithm from depth-first to breadth-first. I bet we can construct cases where this will perform _worse_ than the current algorithm, e.g. when you have branch thickets where every commit is a merge: But I assume that for the most common cases this might be an improvement indeed. The question to me is: does this actually matter in the real world? It would be nice to maybe get some numbers that demonstrate the improvement in a repository. Patrick