On Wed, Jul 30, 2025 at 07:55:10PM +0200, Toon Claes wrote: > Our 'git last-modified' performs a revision walk, and computes a diff at > each point in the walk to figure out whether a given revision changed > any of the paths it considers interesting. > > When changed-path Bloom filters are available, we can avoid computing > many such diffs. Before computing a diff, we first check if any of the > remaining paths of interest were possibly changed at a given commit by > consulting its Bloom filter. If any of them are, we are resigned to > compute the diff. > > If none of those queries returned "maybe", we know that the given commit > doesn't contain any changed paths which are interesting to us. So, we > can avoid computing it in this case. > > Comparing the perf test results on git.git: > > Test HEAD~ HEAD > ------------------------------------------------------------------------------------ > 8020.1: top-level last-modified 4.49(4.34+0.11) 2.22(2.05+0.09) -50.6% > 8020.2: top-level recursive last-modified 5.64(5.45+0.11) 5.62(5.30+0.11) -0.4% > 8020.3: subdir last-modified 0.11(0.06+0.04) 0.07(0.03+0.04) -36.4% Nice results. > diff --git a/builtin/last-modified.c b/builtin/last-modified.c > index e4c73464c7..19bf25f8a5 100644 > --- a/builtin/last-modified.c > +++ b/builtin/last-modified.c > @@ -179,6 +192,27 @@ static void last_modified_diff(struct diff_queue_struct *q, > } > } > > +static int maybe_changed_path(struct last_modified *lm, struct commit *origin) > +{ > + struct bloom_filter *filter; > + struct last_modified_entry *ent; > + struct hashmap_iter iter; > + > + if (!lm->rev.bloom_filter_settings) > + return 1; > + > + filter = get_bloom_filter(lm->rev.repo, origin); > + if (!filter) > + return 1; > + > + hashmap_for_each_entry(&lm->paths, &iter, ent, hashent) { > + if (bloom_filter_contains(filter, &ent->key, > + lm->rev.bloom_filter_settings)) > + return 1; > + } > + return 0; > +} This function is basically the same as `maybe_changed_paths()` in "blame.c", but that isn't a huge issue from my point of view. What makes me wonder though is why we have an additional check over there for whether or not the commit has a valid generation number. > @@ -227,6 +264,9 @@ static int last_modified_init(struct last_modified *lm, struct repository *r, > return argc; > } > > + prepare_commit_graph(lm->rev.repo); > + lm->rev.bloom_filter_settings = get_bloom_filter_settings(lm->rev.repo); > + So this here is why we export `prepare_commit_graph()`. How about we instead expose `bloom_filters_enabled()` that mirrors what we do in `generation_numbers_enabled()` and `corrected_commit_dates_enabled()`? That would both be on a higher level and do exactly what we want to achieve. Patrick