Re: [PATCH v4] bloom: enable bloom filter with wildcard pathspec in revision traversal

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Lidong Yan <yldhome2d2@xxxxxxxxx> writes:
> 
> When traversing commits, a pathspec item can be used to limit the
> traversal to commits that modify the specified paths. And the
> commit-graph includes a Bloom filter to exclude commits that definitely
> did not modify a given pathspec item. During commit traversal, the
> Bloom filter can significantly improve performance. However, it is
> disabled if the specified pathspec item contains wildcard characters
> or magic signatures.
> 
> For performance reason, enable Bloom filter even if a pathspec item
> contains wildcard characters by filtering only the non-wildcard part of
> the pathspec item.
> 
> The function of pathspec magic signature is generally to narrow down
> the path specified by the pathspecs. So, enable Bloom filter when
> the magic signature is "top", "glob", "attr", "--depth" or "literal".
> "exclude" is used to select paths other than the specified path, rather
> than serving as a filtering function, so it cannot be used together with
> the Bloom filter. Since Bloom filter is not case insensitive even in
> case insensitive system (e.g. MacOS), it cannot be used together with
> "icase" magic.
> 
> With this optimization, we get some improvements for pathspecs with
> wildcards or magic signatures. First, in the Git repository we see these
> modest results:
> 
> git log -100 -- "t/*"
> 
> Benchmark 1: new
>  Time (mean ± σ):      20.4 ms ±   0.6 ms
>  Range (min … max):    19.3 ms …  24.4 ms
> 
> Benchmark 2: old
>  Time (mean ± σ):      23.4 ms ±   0.5 ms
>  Range (min … max):    22.5 ms …  24.7 ms
> 
> git log -100 -- ":(top)t"
> 
> Benchmark 1: new
>  Time (mean ± σ):      16.2 ms ±   0.4 ms
>  Range (min … max):    15.3 ms …  17.2 ms
> 
> Benchmark 2: old
>  Time (mean ± σ):      18.6 ms ±   0.5 ms
>  Range (min … max):    17.6 ms …  20.4 ms
> 
> But in a larger repo, such as the LLVM project repo below, we get even
> better results:
> 
> git log -100 -- "libc/*"
> 
> Benchmark 1: new
>  Time (mean ± σ):      16.0 ms ±   0.6 ms
>  Range (min … max):    14.7 ms …  17.8 ms
> 
> Benchmark 2: old
>  Time (mean ± σ):      26.7 ms ±   0.5 ms
>  Range (min … max):    25.4 ms …  27.8 ms
> 
> git log -100 -- ":(top)libc"
> 
> Benchmark 1: new
>  Time (mean ± σ):      15.6 ms ±   0.6 ms
>  Range (min … max):    14.4 ms …  17.7 ms
> 
> Benchmark 2: old
>  Time (mean ± σ):      19.6 ms ±   0.5 ms
>  Range (min … max):    18.6 ms …  20.6 ms
> 
> Signed-off-by: Lidong Yan <yldhome2d2@xxxxxxxxx>
> [jc: avoid allocating zero length path in
> convert_pathspec_to_bloom_keyvec()]
> Signed-off-by: Junio C Hamano <gitster@xxxxxxxxx>
> ---
> revision.c           | 45 +++++++++++++++++++++++++++-----------------
> t/t4216-log-bloom.sh | 31 ++++++++++++++++++++++++++----
> 2 files changed, 55 insertions(+), 21 deletions(-)
> 
> diff --git a/revision.c b/revision.c
> index 18f300d455..79372fd483 100644
> --- a/revision.c
> +++ b/revision.c
> @@ -671,12 +671,17 @@ static void trace2_bloom_filter_statistics_atexit(void)
> 
> static int forbid_bloom_filters(struct pathspec *spec)
> {
> - if (spec->has_wildcard)
> - return 1;
> - if (spec->magic & ~PATHSPEC_LITERAL)
> + unsigned int allowed_magic =
> + PATHSPEC_FROMTOP |
> + PATHSPEC_MAXDEPTH |
> + PATHSPEC_LITERAL |
> + PATHSPEC_GLOB |
> + PATHSPEC_ATTR;
> +
> + if (spec->magic & ~allowed_magic)
> return 1;
> for (size_t nr = 0; nr < spec->nr; nr++)
> - if (spec->items[nr].magic & ~PATHSPEC_LITERAL)
> + if (spec->items[nr].magic & ~allowed_magic)
> return 1;
> 
> return 0;
> @@ -691,26 +696,32 @@ static int convert_pathspec_to_bloom_keyvec(struct bloom_keyvec **out,
> char *path_alloc = NULL;
> const char *path;
> size_t len;
> - int res = 0;
> 
> + len = pi->nowildcard_len;
> + if (len != pi->len) {
> + /*
> + * for path like "dir/file*", nowildcard part would be
> + * "dir/file", but only "dir" should be used for the
> + * bloom filter
> + */
> + while (len > 0 && pi->match[len - 1] != '/')
> + len--;
> + }
> /* remove single trailing slash from path, if needed */
> - if (pi->len > 0 && pi->match[pi->len - 1] == '/') {
> - path_alloc = xmemdupz(pi->match, pi->len - 1);
> + if (len > 0 && pi->match[len - 1] == '/')
> + len--;
> +
> + if (!len)
> + return -1;
> +
> + if (len != pi->len) {
> + path_alloc = xmemdupz(pi->match, len);
> path = path_alloc;
> } else
> path = pi->match;
> 
> - len = strlen(path);
> - if (!len) {
> - res = -1;
> - goto cleanup;
> - }
> -
> *out = bloom_keyvec_new(path, len, settings);
> -
> -cleanup:
> - free(path_alloc);
> - return res;
> + return 0;
> }

I realized that I shouldn’t delete free(path_alloc) part in the patch. I wonder if
Junio could help remove that part from the patch.

Thanks,
Lidong




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux