Re: [PATCH v6 4/4] last-modified: use Bloom filters when available

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jul 30, 2025 at 07:55:10PM +0200, Toon Claes wrote:
> Our 'git last-modified' performs a revision walk, and computes a diff at
> each point in the walk to figure out whether a given revision changed
> any of the paths it considers interesting.
> 
> When changed-path Bloom filters are available, we can avoid computing
> many such diffs. Before computing a diff, we first check if any of the
> remaining paths of interest were possibly changed at a given commit by
> consulting its Bloom filter. If any of them are, we are resigned to
> compute the diff.
> 
> If none of those queries returned "maybe", we know that the given commit
> doesn't contain any changed paths which are interesting to us. So, we
> can avoid computing it in this case.
> 
> Comparing the perf test results on git.git:
> 
>     Test                                        HEAD~             HEAD
>     ------------------------------------------------------------------------------------
>     8020.1: top-level last-modified             4.49(4.34+0.11)   2.22(2.05+0.09) -50.6%
>     8020.2: top-level recursive last-modified   5.64(5.45+0.11)   5.62(5.30+0.11) -0.4%
>     8020.3: subdir last-modified                0.11(0.06+0.04)   0.07(0.03+0.04) -36.4%

Nice results.

> diff --git a/builtin/last-modified.c b/builtin/last-modified.c
> index e4c73464c7..19bf25f8a5 100644
> --- a/builtin/last-modified.c
> +++ b/builtin/last-modified.c
> @@ -179,6 +192,27 @@ static void last_modified_diff(struct diff_queue_struct *q,
>  	}
>  }
>  
> +static int maybe_changed_path(struct last_modified *lm, struct commit *origin)
> +{
> +	struct bloom_filter *filter;
> +	struct last_modified_entry *ent;
> +	struct hashmap_iter iter;
> +
> +	if (!lm->rev.bloom_filter_settings)
> +		return 1;
> +
> +	filter = get_bloom_filter(lm->rev.repo, origin);
> +	if (!filter)
> +		return 1;
> +
> +	hashmap_for_each_entry(&lm->paths, &iter, ent, hashent) {
> +		if (bloom_filter_contains(filter, &ent->key,
> +					  lm->rev.bloom_filter_settings))
> +			return 1;
> +	}
> +	return 0;
> +}

This function is basically the same as `maybe_changed_paths()` in
"blame.c", but that isn't a huge issue from my point of view. What makes
me wonder though is why we have an additional check over there for
whether or not the commit has a valid generation number.

> @@ -227,6 +264,9 @@ static int last_modified_init(struct last_modified *lm, struct repository *r,
>  		return argc;
>  	}
>  
> +	prepare_commit_graph(lm->rev.repo);
> +	lm->rev.bloom_filter_settings = get_bloom_filter_settings(lm->rev.repo);
> +

So this here is why we export `prepare_commit_graph()`. How about we
instead expose `bloom_filters_enabled()` that mirrors what we do in
`generation_numbers_enabled()` and `corrected_commit_dates_enabled()`?
That would both be on a higher level and do exactly what we want to
achieve.

Patrick




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux