Derrick Stolee <stolee@xxxxxxxxx> wrote: > > BEFORE: "NOT TREESAME if there EXISTS a bloom key that reports NO" > > AFTER: "NOT TREESAME if FOR EVERY pathspec there EXISTS a bloom key > that reports NO." > > This "FOR EVERY" condition makes it impossible to use a flat array > of bloom keys for multiple pathspecs, justifying this change. > > What is further confusing here is that we already have logic that > deals with arrays of bloom keys, so I expected that the vector was > the single structure storing a list of those arrays. Instead, the > vector is replacing the array itself. This is made clear by using > the vector immediately in the existing implementation. I think the problem here is that it clearly enough in my comments. When writing the code, I thought about converting the original one-dimensional array struct bloom_key *keys into a two-dimensional array struct bloom_key **. Then I came up with the idea that this two-dimensional array could be designed as struct bloom_keyvec *keyvecs, which might be clearer. Each struct bloom_keyvec would represent all the bloom_key elements for a single pathspec item. >> +struct bloom_keyvec *bloom_keyvec_new(size_t count) >> +{ >> + struct bloom_keyvec *vec; >> + size_t sz = sizeof(struct bloom_keyvec); >> + sz += count * sizeof(struct bloom_key); >> + vec = (struct bloom_keyvec *)xcalloc(1, sz); > You could use CALLOC_ARRAY() to simplify this and drop > the 'sz' variable. You are right. I think you suggest to write struct bloom_keyvec like: ———————- | count | ——————— | *keys | ——> key0 | key1 | key2 | … | ——————— And I am doing here makes struct bloom_keyvec looks like ——————— | count | ——————— | key[0] | ——————— | key[1] | ——————— | … | Although bloom_keyvec_new() appears more complex, the advantage is that bloom_keyvec_destroy() no longer needs to free keys manually. And if I understand correctly, junio had suggested to use the second way [here](https://lore.kernel.org/git/xmqqtt43u36t.fsf@gitster.g/). > >> + vec->count = count; >> + return vec; >> +} >> + >> +void bloom_keyvec_free(struct bloom_keyvec *vec) >> +{ >> + if (!vec) >> + return; >> + for (size_t nr = 0; nr < vec->count; nr++) >> + bloom_key_clear(&vec->key[nr]); >> + free(vec); >> +} >> + >> static int pathmap_cmp(const void *hashmap_cmp_fn_data UNUSED, >> const struct hashmap_entry *eptr, >> const struct hashmap_entry *entry_or_key, >> @@ -541,6 +560,18 @@ int bloom_filter_contains(const struct bloom_filter *filter, >> return 1; >> } >> >> +int bloom_filter_contains_vec(const struct bloom_filter *filter, >> + const struct bloom_keyvec *vec, >> + const struct bloom_filter_settings *settings) >> +{ >> + int ret = 1; >> + >> + for (size_t nr = 0; ret > 0 && nr < vec->count; nr++) >> + ret = bloom_filter_contains(filter, &vec->key[nr], settings); >> + >> + return ret; >> +} > > This implementation is where the subtle detail comes in. Might be worth > a comment to say "if any key in this list is not contained in the filter, > then the filter doesn't match this vector." I will add comment in the next version Thank you for your review, Lidong