Lidong Yan <yldhome2d2@xxxxxxxxx> writes: > > When traversing commits, a pathspec item can be used to limit the > traversal to commits that modify the specified paths. And the > commit-graph includes a Bloom filter to exclude commits that definitely > did not modify a given pathspec item. During commit traversal, the > Bloom filter can significantly improve performance. However, it is > disabled if the specified pathspec item contains wildcard characters > or magic signatures. > > For performance reason, enable Bloom filter even if a pathspec item > contains wildcard characters by filtering only the non-wildcard part of > the pathspec item. > > The function of pathspec magic signature is generally to narrow down > the path specified by the pathspecs. So, enable Bloom filter when > the magic signature is "top", "glob", "attr", "--depth" or "literal". > "exclude" is used to select paths other than the specified path, rather > than serving as a filtering function, so it cannot be used together with > the Bloom filter. Since Bloom filter is not case insensitive even in > case insensitive system (e.g. MacOS), it cannot be used together with > "icase" magic. > > With this optimization, we get some improvements for pathspecs with > wildcards or magic signatures. First, in the Git repository we see these > modest results: > > git log -100 -- "t/*" > > Benchmark 1: new > Time (mean ± σ): 20.4 ms ± 0.6 ms > Range (min … max): 19.3 ms … 24.4 ms > > Benchmark 2: old > Time (mean ± σ): 23.4 ms ± 0.5 ms > Range (min … max): 22.5 ms … 24.7 ms > > git log -100 -- ":(top)t" > > Benchmark 1: new > Time (mean ± σ): 16.2 ms ± 0.4 ms > Range (min … max): 15.3 ms … 17.2 ms > > Benchmark 2: old > Time (mean ± σ): 18.6 ms ± 0.5 ms > Range (min … max): 17.6 ms … 20.4 ms > > But in a larger repo, such as the LLVM project repo below, we get even > better results: > > git log -100 -- "libc/*" > > Benchmark 1: new > Time (mean ± σ): 16.0 ms ± 0.6 ms > Range (min … max): 14.7 ms … 17.8 ms > > Benchmark 2: old > Time (mean ± σ): 26.7 ms ± 0.5 ms > Range (min … max): 25.4 ms … 27.8 ms > > git log -100 -- ":(top)libc" > > Benchmark 1: new > Time (mean ± σ): 15.6 ms ± 0.6 ms > Range (min … max): 14.4 ms … 17.7 ms > > Benchmark 2: old > Time (mean ± σ): 19.6 ms ± 0.5 ms > Range (min … max): 18.6 ms … 20.6 ms > > Signed-off-by: Lidong Yan <yldhome2d2@xxxxxxxxx> > [jc: avoid allocating zero length path in > convert_pathspec_to_bloom_keyvec()] > Signed-off-by: Junio C Hamano <gitster@xxxxxxxxx> > --- > revision.c | 45 +++++++++++++++++++++++++++----------------- > t/t4216-log-bloom.sh | 31 ++++++++++++++++++++++++++---- > 2 files changed, 55 insertions(+), 21 deletions(-) > > diff --git a/revision.c b/revision.c > index 18f300d455..79372fd483 100644 > --- a/revision.c > +++ b/revision.c > @@ -671,12 +671,17 @@ static void trace2_bloom_filter_statistics_atexit(void) > > static int forbid_bloom_filters(struct pathspec *spec) > { > - if (spec->has_wildcard) > - return 1; > - if (spec->magic & ~PATHSPEC_LITERAL) > + unsigned int allowed_magic = > + PATHSPEC_FROMTOP | > + PATHSPEC_MAXDEPTH | > + PATHSPEC_LITERAL | > + PATHSPEC_GLOB | > + PATHSPEC_ATTR; > + > + if (spec->magic & ~allowed_magic) > return 1; > for (size_t nr = 0; nr < spec->nr; nr++) > - if (spec->items[nr].magic & ~PATHSPEC_LITERAL) > + if (spec->items[nr].magic & ~allowed_magic) > return 1; > > return 0; > @@ -691,26 +696,32 @@ static int convert_pathspec_to_bloom_keyvec(struct bloom_keyvec **out, > char *path_alloc = NULL; > const char *path; > size_t len; > - int res = 0; > > + len = pi->nowildcard_len; > + if (len != pi->len) { > + /* > + * for path like "dir/file*", nowildcard part would be > + * "dir/file", but only "dir" should be used for the > + * bloom filter > + */ > + while (len > 0 && pi->match[len - 1] != '/') > + len--; > + } > /* remove single trailing slash from path, if needed */ > - if (pi->len > 0 && pi->match[pi->len - 1] == '/') { > - path_alloc = xmemdupz(pi->match, pi->len - 1); > + if (len > 0 && pi->match[len - 1] == '/') > + len--; > + > + if (!len) > + return -1; > + > + if (len != pi->len) { > + path_alloc = xmemdupz(pi->match, len); > path = path_alloc; > } else > path = pi->match; > > - len = strlen(path); > - if (!len) { > - res = -1; > - goto cleanup; > - } > - > *out = bloom_keyvec_new(path, len, settings); > - > -cleanup: > - free(path_alloc); > - return res; > + return 0; > } I realized that I shouldn’t delete free(path_alloc) part in the patch. I wonder if Junio could help remove that part from the patch. Thanks, Lidong