Patrick Steinhardt <ps@xxxxxx> wrote: > > On Thu, Aug 07, 2025 at 01:12:43PM +0800, Lidong Yan wrote: > > In the subject it should be s/enable bloom/enable Bloom. > >> When traversing commits, a pathspec item can be used to limit the >> traversal to commits that modify the specified paths. And the >> commit-graph includes a Bloom filter to exclude commits that definitely >> did not modify a given pathspec item. During commit traversal, the >> Bloom filter can significantly improve performance. However, it is >> disabled if the specified pathspec item contains wildcard characters >> or magic signatures. > > Let's add a paragraph here, as we now switch into the "what is being > done mode". > >> Enable Bloom filter even if a pathspec item contains wildcard >> characters by filter only the non-wildcard part of the pathspec item. > > s/by filter/by filtering/ > >> With this optimization, we get some improvements for pathspec with >> wildcard and magic signature. First, in the Git repository we see these > > "for pathspecs with wildcards or magic signatures". Will fix all the grammatical errors in next version. >> Also Enable Bloom filter if magic signature is not "exclude" or >> "icase". > > This explains what is done, but not why this is safe to do. I forgot to mention this — I’ll make sure to include it in the next commit message. > >> diff --git a/revision.c b/revision.c >> index 18f300d455..ef8c0b6eca 100644 >> --- a/revision.c >> +++ b/revision.c >> @@ -671,12 +671,13 @@ static void trace2_bloom_filter_statistics_atexit(void) >> >> static int forbid_bloom_filters(struct pathspec *spec) >> { >> - if (spec->has_wildcard) >> - return 1; >> - if (spec->magic & ~PATHSPEC_LITERAL) >> + int forbid_mask = > > The mask should be `unsigned`. > >> + PATHSPEC_EXCLUDE | PATHSPEC_ICASE; > > I think instead of a forbid-mask we should use an allow-mask. Otherwise > it can happen quite easily that we add new magic that isn't compatible > with Bloom filters but forget to update this part here. I'd rather be > slow but correct than fast but incorrect. Interesting and reasonable point — will fix. > > One thing I did wonder though: what happens if the first component > contains the wildcard? We cannot really make any use of the Bloom filter > in that case as the path we match against becomes empty. I expect that > we'll handle this just fine. But is it still more performant than not > even trying Bloom filters in the first place? If the first component contains the wildcard, convert_pathspec_to_bloom_keyvec() failed and returns -1, which lead to prepare_to_use_bloom_filter() cleanup (free and NULLing) all bloom filter related field. So the performant would be as same as bloom filter is not even tried. I actually find a bug in my code and I will add test and fix it, the code should be if (len != pi->len) while (len > 0 && pi->match[len - 1]) len—; if (len > 0 && pi->match[len - 1] == ‘/‘) len—; > We typically indent the heredoc text to the same level as the command. Got it, will fix. > >> + EOF >> + test_bloom_filters_used "-- \:\(attr\:text\)A" && >> + rm .gitattributes > > You can use `test_when_finished" instead to clean up after yourself even > in case the test fails. I see — I never knew how to deal with the issue where one failing test case causes all subsequent tests to fail. Thanks, Lidong