Taylor Blau <me@xxxxxxxxxxxx> writes: > On Wed, Jul 16, 2025 at 03:35:15PM +0200, Toon Claes wrote: >> Our 'git last-modified' performs a revision walk, and computes a diff at >> each point in the walk to figure out whether a given revision changed >> any of the paths it considers interesting. >> >> When changed-path Bloom filters are available, we can avoid computing >> many such diffs. Before computing a diff, we first check if any of the >> remaining paths of interest were possibly changed at a given commit by >> consulting its Bloom filter. If any of them are, we are resigned to >> compute the diff. >> >> If none of those queries returned "maybe", we know that the given commit >> doesn't contain any changed paths which are interesting to us. So, we >> can avoid computing it in this case. >> >> Comparing the perf test results on git.git: >> >> Test HEAD~ HEAD >> ------------------------------------------------------------------------------------ >> 8020.1: top-level last-modified 4.49(4.34+0.11) 2.22(2.05+0.09) -50.6% >> 8020.2: top-level recursive last-modified 5.64(5.45+0.11) 5.62(5.30+0.11) -0.4% >> 8020.3: subdir last-modified 0.11(0.06+0.04) 0.07(0.03+0.04) -36.4% > > As an aside on 8020.3 (that I probably should have mentioned in the last > commit), I think that our "| head -n1" heuristic for picking a sub-tree > is skewing these results down. In git.git, the lexicographically > earliest sub-tree is ".github", which is awfully tiny. I wonder if we > should be grabbing the *last* sub-tree, or maybe the largest one by > count of entries? Ah yes, that's automatable. Good suggestion. > >> @@ -40,6 +43,12 @@ struct last_modified { >> >> static void last_modified_release(struct last_modified *lm) >> { >> + struct hashmap_iter iter; >> + struct last_modified_entry *ent; >> + >> + hashmap_for_each_entry(&lm->paths, &iter, ent, hashent) >> + clear_bloom_key(&ent->key); >> + > > I did a double-take to make sure that ent->key would always be > initialized here, but it is thanks to the FLEX_ALLOC_STR() call below. > >> hashmap_clear_and_free(&lm->paths, struct last_modified_entry, hashent); >> release_revisions(&lm->rev); >> } >> @@ -67,6 +76,9 @@ static void add_path_from_diff(struct diff_queue_struct *q, >> >> FLEX_ALLOC_STR(ent, path, path); >> oidcpy(&ent->oid, &p->two->oid); >> + if (lm->rev.bloom_filter_settings) >> + fill_bloom_key(path, strlen(path), &ent->key, >> + lm->rev.bloom_filter_settings); >> hashmap_entry_init(&ent->hashent, strhash(ent->path)); >> hashmap_add(&lm->paths, &ent->hashent); >> } >> @@ -126,6 +138,7 @@ static void mark_path(const char *path, const struct object_id *oid, >> data->callback(path, data->commit, data->callback_data); >> >> hashmap_remove(data->paths, &ent->hashent, path); >> + clear_bloom_key(&ent->key); > > OK, we're calling clear_bloom_key() here, too, but it uses > FREE_AND_NULL(), so calling it again in last_modified_release() may be a > noop, but won't ever be a double-free. > >> @@ -238,6 +276,13 @@ static int last_modified_init(struct last_modified *lm, struct repository *r, >> return argc; >> } >> >> + /* >> + * We're not interested in generation numbers here, >> + * but calling this function to prepare the commit-graph. >> + */ >> + (void)generation_numbers_enabled(lm->rev.repo); >> + lm->rev.bloom_filter_settings = get_bloom_filter_settings(lm->rev.repo); > > Hmmph. I think when I originally wrote this I was using the side-effect > of calling generation_numbers_enabled() as a hack. But I think that it > may be worth making "prepare_commit_graph()" a non-static function and > calling that instead. Well thanks for confirming this assumption. I wasn't 100%, and I agree having a non-static "prepare_commit_graph()" would be better. Somewhat related to that. In your branch you have this line in "maybe_changed_path()": if (commit_graph_generation(origin) == GENERATION_NUMBER_INFINITY) return 1; I excluded this line because if we ignore the result of "generation_numbers_enabled()", why would it matter to have a generation number? The thing is "maybe_changed_path()" in blame.c also has this line too, but unfortunately git-blaming that line didn't learn me anything why it's there. Do you have any idea? -- Cheers, Toon