On 8/24/2025 3:06 PM, SZEDER Gábor wrote: > In process_ranges_merge_commit(), the line-level log first creates an > array of diff queues by iterating over all parents of a merge commit > and computing a tree diff for each. Then in a second loop it iterates > over those diff queues, and if it finds that none of the interesting > paths were modified in one of them, then it will return early. This > means that when none of the interesting paths were modified between a > merge and its first parent, then the tree diff between the merge and > its second (Nth...) parent was computed in vain. Great find! This goes all the way back to the original implementation in 12da1d1f6f (Implement line-history search (git log -L), 2013-03-28) where this detail could easily be missed in the rest of the scaffolding to implement the feature. This is an understandable mistake to make as it can take some time to understand Git's simplified history computation and how it short- circuits these merge diffs in most cases. > Summary > './git -C ~/src/git log -L:'lookup_commit(':commit.c v2.51.0' ran > 2.25 ± 0.03 times faster than './git_v2.51.0 -C ~/src/git log -L:'lookup_commit(':commit.c v2.51.0' > Summary > './git -C ~/src/linux.git log -L:build_restore_work_registers:arch/mips/mm/tlbex.c v6.16' ran > 1.32 ± 0.01 times faster than './git_v2.51.0 -C ~/src/linux.git log -L:build_restore_work_registers:arch/mips/mm/tlbex.c v6.16' Great stats! > And since now each iteration computes a tree diff and processes its > result, there is no reason to store the diff queues for each merge > parent anymore, so replace that diff queue array with a loop-local > diff queue variable. With this change the static free_diffqueues() > helper function in 'line-log.c' has no more callers left, remove it. > > Signed-off-by: SZEDER Gábor <szeder.dev@xxxxxxxxx> > --- > line-log.c | 20 +++++--------------- > 1 file changed, 5 insertions(+), 15 deletions(-) > > diff --git a/line-log.c b/line-log.c > index 07f2154e84..cf30915c94 100644 > --- a/line-log.c > +++ b/line-log.c > @@ -1087,13 +1087,6 @@ static struct diff_filepair *diff_filepair_dup(struct diff_filepair *pair) > return new_filepair; > } > > -static void free_diffqueues(int n, struct diff_queue_struct *dq) > -{ > - for (int i = 0; i < n; i++) > - diff_queue_clear(&dq[i]); > - free(dq); > -} > - > static int process_all_files(struct line_log_data **range_out, > struct rev_info *rev, > struct diff_queue_struct *queue, > @@ -1209,7 +1202,6 @@ static int process_ranges_ordinary_commit(struct rev_info *rev, struct commit *c > static int process_ranges_merge_commit(struct rev_info *rev, struct commit *commit, > struct line_log_data *range) > { > - struct diff_queue_struct *diffqueues; > struct line_log_data **cand; > struct commit **parents; > struct commit_list *p; > @@ -1220,20 +1212,19 @@ static int process_ranges_merge_commit(struct rev_info *rev, struct commit *comm > if (nparents > 1 && rev->first_parent_only) > nparents = 1; > > - ALLOC_ARRAY(diffqueues, nparents); > CALLOC_ARRAY(cand, nparents); > ALLOC_ARRAY(parents, nparents); > > p = commit->parents; > for (i = 0; i < nparents; i++) { > + struct diff_queue_struct diffqueue = DIFF_QUEUE_INIT; > + int changed; > parents[i] = p->item; > p = p->next; > - queue_diffs(range, &rev->diffopt, &diffqueues[i], commit, parents[i]); > - } > + queue_diffs(range, &rev->diffopt, &diffqueue, commit, parents[i]); > > - for (i = 0; i < nparents; i++) { > - int changed; > - changed = process_all_files(&cand[i], rev, &diffqueues[i], range); > + changed = process_all_files(&cand[i], rev, &diffqueue, range); > + diff_queue_clear(&diffqueue); > if (!changed) { > /* > * This parent can take all the blame, so we > @@ -1267,7 +1258,6 @@ static int process_ranges_merge_commit(struct rev_info *rev, struct commit *comm > free(cand[i]); > } > free(cand); > - free_diffqueues(nparents, diffqueues); > return ret; > > /* NEEDSWORK evil merge detection stuff */ This diff is a lot cleaner than I expected it to be. Excellent! I applied this patch locally and tested it on a few repos I have to give extra confidence to your patch. For an internal monorepo, I was able to measure these results: Benchmark 1: old Time (mean ± σ): 19.709 s ± 0.014 s [User: 18.846 s, System: 0.862 s] Range (min … max): 19.681 s … 19.725 s 10 runs Benchmark 2: new Time (mean ± σ): 9.061 s ± 0.015 s [User: 8.487 s, System: 0.574 s] Range (min … max): 9.042 s … 9.089 s 10 runs Summary 'new' ran 2.18 ± 0.00 times faster than 'old' I did also want to check to see the impact of f32dde8c12 (line-log: integrate with changed-path Bloom filters, 2020-05-11), and having computed filters diminished the size of your impact somewhat: Your Git example: Benchmark 1: old Time (mean ± σ): 279.2 ms ± 2.2 ms [User: 231.0 ms, System: 47.9 ms] Range (min … max): 275.5 ms … 282.6 ms 10 runs Benchmark 2: new Time (mean ± σ): 242.4 ms ± 3.6 ms [User: 191.8 ms, System: 50.4 ms] Range (min … max): 237.3 ms … 249.9 ms 12 runs Summary 'new ' ran 1.15 ± 0.02 times faster than 'old' Your Linux example: Benchmark 1: old Time (mean ± σ): 1.694 s ± 0.008 s [User: 1.524 s, System: 0.169 s] Range (min … max): 1.688 s … 1.714 s 10 runs Benchmark 2: new Time (mean ± σ): 1.644 s ± 0.008 s [User: 1.482 s, System: 0.161 s] Range (min … max): 1.636 s … 1.663 s 10 runs Summary 'new ' ran 1.03 ± 0.01 times faster than 'old' My internal monorepo example: Benchmark 1: old Time (mean ± σ): 3.749 s ± 0.007 s [User: 3.188 s, System: 0.559 s] Range (min … max): 3.736 s … 3.759 s 10 runs Benchmark 2: new Time (mean ± σ): 2.713 s ± 0.005 s [User: 2.318 s, System: 0.394 s] Range (min … max): 2.706 s … 2.723 s 10 runs Summary 'new' ran 1.38 ± 0.00 times faster than 'old' Rerunning with "-c commitGraph.readChangedPaths=false" resulted in numbers closer to your examples (940ms->420ms for Git and 2.6->2.1s for Linux). I expect most users to be in the situation where there are no changed-path Bloom filters, so this is very good to deliver that value. At first, I found this to be concerning: we only store filters for the diff between a commit and its first parent, so this cost of visiting the later parents should be _much worse_ in those cases. However, it turns out that the way that the filters are handled in line_log_process_ranges_arbitrary_commit() avoids a call to process_ranges_merge_commit() if the filter says that the first parent is TREESAME on the given path. This means that a good chunk of the performance benefits in f32dde8c12 (line-log: integrate with changed-path Bloom filters, 2020-05-11) are _actually_ due to avoiding this extra work for multiple parents. Thanks for digging in and bringing this benefit to all users! Thanks, -Stolee