Re: [PATCH] bloom: enable bloom filter with wildcard pathspec in revision traversal

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Patrick Steinhardt <ps@xxxxxx> wrote:
> 
> On Thu, Aug 07, 2025 at 01:12:43PM +0800, Lidong Yan wrote:
> 
> In the subject it should be s/enable bloom/enable Bloom.
> 
>> When traversing commits, a pathspec item can be used to limit the
>> traversal to commits that modify the specified paths. And the
>> commit-graph includes a Bloom filter to exclude commits that definitely
>> did not modify a given pathspec item. During commit traversal, the
>> Bloom filter can significantly improve performance. However, it is
>> disabled if the specified pathspec item contains wildcard characters
>> or magic signatures.
> 
> Let's add a paragraph here, as we now switch into the "what is being
> done mode".
> 
>> Enable Bloom filter even if a pathspec item contains wildcard
>> characters by filter only the non-wildcard part of the pathspec item.
> 
> s/by filter/by filtering/
> 
>> With this optimization, we get some improvements for pathspec with
>> wildcard and magic signature. First, in the Git repository we see these
> 
> "for pathspecs with wildcards or magic signatures".

Will fix all the grammatical errors in next version.


>> Also Enable Bloom filter if magic signature is not "exclude" or
>> "icase".
> 
> This explains what is done, but not why this is safe to do.

I forgot to mention this — I’ll make sure to include it in the next commit message. 

> 
>> diff --git a/revision.c b/revision.c
>> index 18f300d455..ef8c0b6eca 100644
>> --- a/revision.c
>> +++ b/revision.c
>> @@ -671,12 +671,13 @@ static void trace2_bloom_filter_statistics_atexit(void)
>> 
>> static int forbid_bloom_filters(struct pathspec *spec)
>> {
>> - if (spec->has_wildcard)
>> - return 1;
>> - if (spec->magic & ~PATHSPEC_LITERAL)
>> + int forbid_mask =
> 
> The mask should be `unsigned`.
> 
>> + PATHSPEC_EXCLUDE | PATHSPEC_ICASE;
> 
> I think instead of a forbid-mask we should use an allow-mask. Otherwise
> it can happen quite easily that we add new magic that isn't compatible
> with Bloom filters but forget to update this part here. I'd rather be
> slow but correct than fast but incorrect.

Interesting and reasonable point — will fix.

> 
> One thing I did wonder though: what happens if the first component
> contains the wildcard? We cannot really make any use of the Bloom filter
> in that case as the path we match against becomes empty. I expect that
> we'll handle this just fine. But is it still more performant than not
> even trying Bloom filters in the first place?

If the first component contains the wildcard, convert_pathspec_to_bloom_keyvec()
failed and returns -1, which lead to prepare_to_use_bloom_filter() cleanup (free and NULLing)
all bloom filter related field. So the performant would be as same as bloom filter is
not even tried.

I actually find a bug in my code and I will add test and fix it, the code should be

	if (len != pi->len)
		while (len > 0 && pi->match[len - 1])
			len—;
	if (len > 0 && pi->match[len - 1] == ‘/‘)
		len—;

> We typically indent the heredoc text to the same level as the command.

Got it, will fix.

> 
>> + EOF
>> + test_bloom_filters_used "-- \:\(attr\:text\)A" &&
>> + rm .gitattributes
> 
> You can use `test_when_finished" instead to clean up after yourself even
> in case the test fails.

I see — I never knew how to deal with the issue where one failing test case
causes all subsequent tests to fail.

Thanks,
Lidong






[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux