Lidong Yan <yldhome2d2@xxxxxxxxx> writes: > static int forbid_bloom_filters(struct pathspec *spec) > { > - if (spec->has_wildcard) > - return 1; > - if (spec->magic & ~PATHSPEC_LITERAL) > + unsigned int allowed_magic = > + PATHSPEC_FROMTOP | > + PATHSPEC_MAXDEPTH | > + PATHSPEC_LITERAL | > + PATHSPEC_GLOB | > + PATHSPEC_ATTR; > + > + if (spec->magic & ~allowed_magic) > return 1; > for (size_t nr = 0; nr < spec->nr; nr++) > - if (spec->items[nr].magic & ~PATHSPEC_LITERAL) > + if (spec->items[nr].magic & ~allowed_magic) > return 1; Quite straight-forward and easy to see that this is a simple enhancement of the existing code. Good. > @@ -693,9 +698,22 @@ static int convert_pathspec_to_bloom_keyvec(struct bloom_keyvec **out, > size_t len; > int res = 0; > > + len = pi->nowildcard_len; > + if (len != pi->len) { > + /* > + * for path like "/dir/file*", nowildcard part would be > + * "/dir/file", but only "/dir" should be used for the Leading "/" makes it look as if the pathspec element can begin with a slash, but it can not, can it? > + * bloom filter > + */ > + while (len > 0 && pi->match[len - 1] != '/') > + len--; In a tree that has both "builtin/" directory and "builtin.h" file, what pathspec_element do we get here when we run $ git log "builtin*" We need to be able to catch commits that touch anything in "builtin/" and also anything at the top-level that begins with "builtin", like "builtin.h" and other things that might have existed ever in the history. I think len goes down to 0 and that is the correct behaviour. The code does deal with the case where len is reduced down to zero, but a bit poorly. It would allocate a zero-length string, then realize it frees it, and returns -1. As it must know how long the resulting path is before it allocated, and by definition len is pi->len if it did not have to allocate, it should be able to take the "goto cleanup" code path before it even attempts to allocate, and it should not have to do strlen(). But the current code is not incorrect. > + } > /* remove single trailing slash from path, if needed */ > - if (pi->len > 0 && pi->match[pi->len - 1] == '/') { > - path_alloc = xmemdupz(pi->match, pi->len - 1); > + if (len > 0 && pi->match[len - 1] == '/') > + len--; > + > + if (len != pi->len) { > + path_alloc = xmemdupz(pi->match, len); > path = path_alloc; > } else > path = pi->match; > +test_expect_success 'git log with paths all contain non-wildcard part uses Bloom filter' ' > + test_bloom_filters_used "-- A/\* file4" && We'd ask the Bloom filter: skip commits that you know that can never be touching either "A/(anything)" or "file4". > + test_bloom_filters_used "-- A/file\*" && We'd ask the Bloom filter: skip commits that you know that can never be touching either "A/". > + test_bloom_filters_used "-- * A/\*" What do we ask? The second one says that a commit that might touch "A/(something)" is worth investigating, but what about the first one? Ahhh, that one is not quoted, so the shell expands to existing files (which presumably do not have any wildcard characters). OK. If the lone * were quoted, we shouldn't be using the Bloom filter. > +' > + > +test_expect_success 'git log with path only contains wildcard part does not use Bloom filter' ' > test_bloom_filters_not_used "-- file\*" && OK, this is exactly the case I wondered about in the above wrt "builtin*" and I agree with the expectation of this test. > - test_bloom_filters_not_used "-- A/\* file4" && > - test_bloom_filters_not_used "-- file4 A/\*" && > - test_bloom_filters_not_used "-- * A/\*" > + test_bloom_filters_not_used "-- file\* A/\*" && This one I understand. The first one reduces len down to 0 in the wildcard stripping loop, and makes us say "a commit that might touch any path is worth investigating", which amounts to the same thing as not using the Bloom filter at all. > + test_bloom_filters_not_used "-- file\* *" && Ditto. Even if the unquoted * may expand to many concrete paths, "file\*" that can match any path that begins with "file" that ever have existed in the history would not help skipping any commit. > + test_bloom_filters_not_used "-- \*" Ditto. > +' > + > +test_expect_success 'git log with path contains various magic signatures' ' > + cd A && > + test_bloom_filters_used "-- \:\(top\)B" && > + cd .. && > + > + test_bloom_filters_used "-- \:\(glob\)A/\*\*/C" && > + test_bloom_filters_not_used "-- \:\(icase\)FILE4" && > + test_bloom_filters_not_used "-- \:\(exclude\)A/B/C" && > + > + test_when_finished "rm -f .gitattributes" && > + cat >.gitattributes <<-EOF && > + A/file1 text > + A/B/file2 -text > + EOF > + test_bloom_filters_used "-- \:\(attr\:text\)A" > ' OK. Good to see negative test cases here, which we sometimes forget to test when showing off our shiny new toy. Taking what I suggested above, here is a possible improvement. revision.c | 18 ++++++++---------- 1 file changed, 8 insertions(+), 10 deletions(-) diff --git i/revision.c w/revision.c index 2a5b98390e..2a92bdda84 100644 --- i/revision.c +++ w/revision.c @@ -696,14 +696,14 @@ static int convert_pathspec_to_bloom_keyvec(struct bloom_keyvec **out, char *path_alloc = NULL; const char *path; size_t len; - int res = 0; + int res = -1; /* be pessimistic */ len = pi->nowildcard_len; if (len != pi->len) { /* - * for path like "/dir/file*", nowildcard part would be - * "/dir/file", but only "/dir" should be used for the - * bloom filter + * for path like "dir/file*", nowildcard part would be + * "dir/file", but only "dir" should be used for the + * bloom filter. */ while (len > 0 && pi->match[len - 1] != '/') len--; @@ -712,19 +712,17 @@ static int convert_pathspec_to_bloom_keyvec(struct bloom_keyvec **out, if (len > 0 && pi->match[len - 1] == '/') len--; + if (!len) + goto cleanup; + if (len != pi->len) { path_alloc = xmemdupz(pi->match, len); path = path_alloc; } else path = pi->match; - len = strlen(path); - if (!len) { - res = -1; - goto cleanup; - } - *out = bloom_keyvec_new(path, len, settings); + res = 0; cleanup: free(path_alloc);