Re: [PATCH] bloom: enable bloom filter with wildcard pathspec in revision traversal

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Patrick Steinhardt <ps@xxxxxx> writes:

> On Thu, Aug 07, 2025 at 01:12:43PM +0800, Lidong Yan wrote:
>
> In the subject it should be s/enable bloom/enable Bloom.
>
>> When traversing commits, a pathspec item can be used to limit the
>> traversal to commits that modify the specified paths. And the
>> commit-graph includes a Bloom filter to exclude commits that definitely
>> did not modify a given pathspec item. During commit traversal, the
>> Bloom filter can significantly improve performance. However, it is
>> disabled if the specified pathspec item contains wildcard characters
>> or magic signatures.
>
> Let's add a paragraph here, as we now switch into the "what is being
> done mode".

I agree, a paragraph break is very welcome here.  I was confused
when I read this first that you are suggesting to add a new
paragraph with unspecified contents.

>> Also Enable Bloom filter if magic signature is not "exclude" or
>> "icase".
>
> This explains what is done, but not why this is safe to do.

And for that, the enumeration needs to be done inclusively, e.g. "we
know :(literal) is OK because we can do X; we know :(glob) is OK
because we can do Y".

The numbers are impressive ;-)

>> diff --git a/revision.c b/revision.c
>> index 18f300d455..ef8c0b6eca 100644
>> --- a/revision.c
>> +++ b/revision.c
>> @@ -671,12 +671,13 @@ static void trace2_bloom_filter_statistics_atexit(void)
>>  
>>  static int forbid_bloom_filters(struct pathspec *spec)
>>  {
>> -	if (spec->has_wildcard)
>> -		return 1;
>> -	if (spec->magic & ~PATHSPEC_LITERAL)
>> +	int forbid_mask =
>
> The mask should be `unsigned`.
>
>> +		PATHSPEC_EXCLUDE | PATHSPEC_ICASE;
>
> I think instead of a forbid-mask we should use an allow-mask. Otherwise
> it can happen quite easily that we add new magic that isn't compatible
> with Bloom filters but forget to update this part here. I'd rather be
> slow but correct than fast but incorrect.

Good suggestion.  From the use of &-, it is obvious it is a mask, but
"allow-mask" is not clear among what kind of things some are
allowed, so how about calling it:

    unsigned allowed_magic;

Thanks.




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux