Patrick Steinhardt <ps@xxxxxx> writes: > On Wed, Jul 30, 2025 at 07:55:10PM +0200, Toon Claes wrote: >> Our 'git last-modified' performs a revision walk, and computes a diff at >> each point in the walk to figure out whether a given revision changed >> any of the paths it considers interesting. >> >> When changed-path Bloom filters are available, we can avoid computing >> many such diffs. Before computing a diff, we first check if any of the >> remaining paths of interest were possibly changed at a given commit by >> consulting its Bloom filter. If any of them are, we are resigned to >> compute the diff. >> >> If none of those queries returned "maybe", we know that the given commit >> doesn't contain any changed paths which are interesting to us. So, we >> can avoid computing it in this case. >> >> Comparing the perf test results on git.git: >> >> Test HEAD~ HEAD >> ------------------------------------------------------------------------------------ >> 8020.1: top-level last-modified 4.49(4.34+0.11) 2.22(2.05+0.09) -50.6% >> 8020.2: top-level recursive last-modified 5.64(5.45+0.11) 5.62(5.30+0.11) -0.4% >> 8020.3: subdir last-modified 0.11(0.06+0.04) 0.07(0.03+0.04) -36.4% > > Nice results. > >> diff --git a/builtin/last-modified.c b/builtin/last-modified.c >> index e4c73464c7..19bf25f8a5 100644 >> --- a/builtin/last-modified.c >> +++ b/builtin/last-modified.c >> @@ -179,6 +192,27 @@ static void last_modified_diff(struct diff_queue_struct *q, >> } >> } >> >> +static int maybe_changed_path(struct last_modified *lm, struct commit *origin) >> +{ >> + struct bloom_filter *filter; >> + struct last_modified_entry *ent; >> + struct hashmap_iter iter; >> + >> + if (!lm->rev.bloom_filter_settings) >> + return 1; >> + >> + filter = get_bloom_filter(lm->rev.repo, origin); >> + if (!filter) >> + return 1; >> + >> + hashmap_for_each_entry(&lm->paths, &iter, ent, hashent) { >> + if (bloom_filter_contains(filter, &ent->key, >> + lm->rev.bloom_filter_settings)) >> + return 1; >> + } >> + return 0; >> +} > > This function is basically the same as `maybe_changed_paths()` in > "blame.c", but that isn't a huge issue from my point of view. What makes > me wonder though is why we have an additional check over there for > whether or not the commit has a valid generation number. I've been asking me the same question. And I couldn't find a good reason (neither from the commit history, or from my reasoning). This check was in the version shared by Taylor, but because we were ignoring the return value from generation_numbers_enabled() in that version, it didn't make sense to me to do this check. That's why I removed it. >> @@ -227,6 +264,9 @@ static int last_modified_init(struct last_modified *lm, struct repository *r, >> return argc; >> } >> >> + prepare_commit_graph(lm->rev.repo); >> + lm->rev.bloom_filter_settings = get_bloom_filter_settings(lm->rev.repo); >> + > > So this here is why we export `prepare_commit_graph()`. How about we > instead expose `bloom_filters_enabled()` that mirrors what we do in > `generation_numbers_enabled()` and `corrected_commit_dates_enabled()`? > That would both be on a higher level and do exactly what we want to > achieve. I've got another proposal, what if we let get_bloom_filter_settings() call prepare_commit_graph()? Functions like repo_find_commit_pos_in_graph() and lookup_commit_in_graph() do this too. -- Cheers, Toon