Re: [PATCH 0/2] bloom: use bloom filter given multiple pathspec

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Junio C Hamano <gitster@xxxxxxxxx> writes:
> 
> Lidong Yan <yldhome2d2@xxxxxxxxx> writes:
> 
>> git won't use bloom filter for multiple pathspec, which makes the command
> 
> Let's get the terminology straight.  A pathspec consists of one or
> more pathspec elements (or pathspec items).

Thanks for the clarification. I will be more precise with the terminology in v2.

> Also, "git won't" is overly general.  The series title shares the
> same issue ("given multiple pathspec" does not even hint that this
> is about revision traversal---you are not making filter used with
> pathspec with more than one element in other code paths).
> 
> Perhaps like:
> 
>    The revision traversal limited by pathspec has optimization when
>    the pathspec has only one element, it does not use any pathspec
>    magic (other than literal), and there is no wildcard.
> 
>    While it is much harder to lift the latter two limitations,
>    supporting a pathspec with multiple elements is relatively easy.
>    Just make sure we hash each of them separately and ask the bloom
>    filter about them, and if we see none of them can possibly be
>    affected by the commit, we can skip without tree comparison.
> 
> or something along that line?
> 

What you wrote makes perfect sense to me, I’ll just copy and paste
those paragraphs into my cover letter. And the title would be
  "bloom: enable bloom filter optimization for multiple pathspec elements in revision traversal"

> Can we have a set of real tests to make sure that the updated filter
> code still identifies commits that touch the files without false
> negatives?  False positives are OK as we will follow them with real
> tree comparison to determine what exactly got changed, but false
> negatives are absolute no-no.
> 
> Testing to see that the filter code path is activated is much less
> interesting than the filter code path still functions correctly with
> these changes presented here.  I have a feeling that with the
> changes to the test in this series, you wouldn't even find a bug
> where you simply added subpaths for all pathspec elements into a
> single array and use the original "bloom has to say 'possibly yes'
> to all array elements" logic (which would incorrectly require that
> both file1 and file2 must be modified).

I assume that t4216/test_bloom_filters_used has already verified that
using bloom filters with multiple pathspec elements produces the same
results as when bloom filters are not used. But I would love to add more
test cases to check no false negative happened.

Thanks,
Lidong




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux