Re: [PATCH v5.1 4/4] bloom: optimize multiple pathspec items in revision

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Derrick Stolee <stolee@xxxxxxxxx> wrote:
> 
> --- >8 ---
> 
> From fe255b1acfbe90fa8e4c335435ae18ee95e6243c Mon Sep 17 00:00:00 2001
> From: Lidong Yan <502024330056@xxxxxxxxxxxxxxxx>
> Date: Thu, 10 Jul 2025 08:04:34 -0400
> Subject: [PATCH v5.1 4/4] bloom: optimize multiple pathspec items in revision
> traversal
> 
> To enable optimize multiple pathspec items in revision traversal,
> return 0 if all pathspec item is literal in forbid_bloom_filters().
> Add for loops to initialize and check each pathspec item's bloom_keyvec
> when optimization is possible.
> 
> Add new test cases in t/t4216-log-bloom.sh to ensure
>  - consistent results between the optimization for multiple pathspec
>    items using bloom filter and the case without bloom filter
>    optimization.
>  - does not use bloom filter if any pathspec item is not literal.
> 
> With these optimizations, we get some improvements for multi-pathspec runs
> of 'git log'. First, in the Git repository we see these modest results:
> 
> Benchmark 1: old
>  Time (mean ± σ):      73.1 ms ±   2.9 ms
>  Range (min … max):    69.9 ms …  84.5 ms    42 runs
> 
> Benchmark 2: new
>  Time (mean ± σ):      55.1 ms ±   2.9 ms
>  Range (min … max):    51.1 ms …  61.2 ms    52 runs
> 
> Summary
>  'new' ran
>    1.33 ± 0.09 times faster than 'old'
> 
> But in a larger repo, such as the LLVM project repo below, we get even
> better results:
> 
> Benchmark 1: old
>  Time (mean ± σ):      1.974 s ±  0.006 s
>  Range (min … max):    1.960 s …  1.983 s    10 runs
> 
> Benchmark 2: new
>  Time (mean ± σ):     262.9 ms ±   2.4 ms
>  Range (min … max):   257.7 ms … 266.2 ms    11 runs
> 
> Summary
>  'new' ran
>    7.51 ± 0.07 times faster than 'old'

Hyperfine do looks better. I will put this into commit message and
cover letter in v6.

> 
> Signed-off-by: Lidong Yan <502024330056@xxxxxxxxxxxxxxxx>
> Signed-off-by: Derrick Stolee <stolee@xxxxxxxxx>
> ---
> revision.c           | 22 +++++++++++-----------
> t/t4216-log-bloom.sh | 23 ++++++++++++++---------
> 2 files changed, 25 insertions(+), 20 deletions(-)
> 
> diff --git a/revision.c b/revision.c
> index 4c09b594c55..ca8c1dde8ca 100644
> --- a/revision.c
> +++ b/revision.c
> @@ -675,12 +675,11 @@ static int forbid_bloom_filters(struct pathspec *spec)
> {
> if (spec->has_wildcard)
> return 1;
> - if (spec->nr > 1)
> - return 1;
> if (spec->magic & ~PATHSPEC_LITERAL)
> return 1;
> - if (spec->nr && (spec->items[0].magic & ~PATHSPEC_LITERAL))
> - return 1;
> + for (size_t nr = 0; nr < spec->nr; nr++)
> + if (spec->items[nr].magic & ~PATHSPEC_LITERAL)
> + return 1;
> 
> return 0;
> }
> @@ -733,13 +732,14 @@ static void prepare_to_use_bloom_filter(struct rev_info *revs)
> if (!revs->pruning.pathspec.nr)
> return;
> 
> - revs->bloom_keyvecs_nr = 1;
> - CALLOC_ARRAY(revs->bloom_keyvecs, 1);
> -
> - if (convert_pathspec_to_filter(&revs->pruning.pathspec.items[0],
> -       &revs->bloom_keyvecs[0],
> -       revs->bloom_filter_settings))
> - goto fail;
> + revs->bloom_keyvecs_nr = revs->pruning.pathspec.nr;
> + CALLOC_ARRAY(revs->bloom_keyvecs, revs->bloom_keyvecs_nr);
> + for (int i = 0; i < revs->pruning.pathspec.nr; i++) {
> + if (convert_pathspec_to_filter(&revs->pruning.pathspec.items[i],
> +       &revs->bloom_keyvecs[i],
> +       revs->bloom_filter_settings))
> + goto fail;
> + }
> 
> if (trace2_is_enabled() && !bloom_filter_atexit_registered) {
> atexit(trace2_bloom_filter_statistics_atexit);
> diff --git a/t/t4216-log-bloom.sh b/t/t4216-log-bloom.sh
> index 8910d53cac1..639868ac562 100755
> --- a/t/t4216-log-bloom.sh
> +++ b/t/t4216-log-bloom.sh
> @@ -66,8 +66,9 @@ sane_unset GIT_TRACE2_CONFIG_PARAMS
> 
> setup () {
> rm -f "$TRASH_DIRECTORY/trace.perf" &&
> - git -c core.commitGraph=false log --pretty="format:%s" $1 >log_wo_bloom &&
> - GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.perf" git -c core.commitGraph=true log --pretty="format:%s" $1 >log_w_bloom
> + eval git -c core.commitGraph=false log --pretty="format:%s" "$1" >log_wo_bloom &&
> + eval "GIT_TRACE2_PERF=\"$TRASH_DIRECTORY/trace.perf\"" \
> + git -c core.commitGraph=true log --pretty="format:%s" "$1" >log_w_bloom
> }
> 
> test_bloom_filters_used () {
> @@ -138,10 +139,6 @@ test_expect_success 'git log with --walk-reflogs does not use Bloom filters' '
> test_bloom_filters_not_used "--walk-reflogs -- A"
> '
> 
> -test_expect_success 'git log -- multiple path specs does not use Bloom filters' '
> - test_bloom_filters_not_used "-- file4 A/file1"
> -'
> -
> test_expect_success 'git log -- "." pathspec at root does not use Bloom filters' '
> test_bloom_filters_not_used "-- ."
> '
> @@ -151,9 +148,17 @@ test_expect_success 'git log with wildcard that resolves to a single path uses B
> test_bloom_filters_used "-- *renamed"
> '
> 
> -test_expect_success 'git log with wildcard that resolves to a multiple paths does not uses Bloom filters' '
> - test_bloom_filters_not_used "-- *" &&
> - test_bloom_filters_not_used "-- file*"
> +test_expect_success 'git log with multiple literal paths uses Bloom filter' '
> + test_bloom_filters_used "-- file4 A/file1" &&
> + test_bloom_filters_used "-- *" &&
> + test_bloom_filters_used "-- file*"
> +'
> +
> +test_expect_success 'git log with path contains a wildcard does not use Bloom filter' '
> + test_bloom_filters_not_used "-- file\*" &&
> + test_bloom_filters_not_used "-- A/\* file4" &&
> + test_bloom_filters_not_used "-- file4 A/\*" &&
> + test_bloom_filters_not_used "-- * A/\*"
> '
> 
> test_expect_success 'setup - add commit-graph to the chain without Bloom filters' '
> -- 
> 2.47.2.vfs.0.2
> 

Looks great, I will apply this above patch 3.

Thanks,
Lidong






[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux