Re: [PATCH v3 1/2] bloom: replace struct bloom_key * with struct bloom_keyvec

Lidong Yan <502024330056@xxxxxxxxxxxxxxxx> · Wed, 2 Jul 2025 23:49:04 +0800

Patrick Steinhardt <ps@xxxxxx> writes:
> 
> On Sat, Jun 28, 2025 at 12:21:39PM +0800, Lidong Yan wrote:
>> diff --git a/bloom.h b/bloom.h
>> index 6e46489a20..9e4e832c8c 100644
>> --- a/bloom.h
>> +++ b/bloom.h
>> @@ -74,6 +74,11 @@ struct bloom_key {
>> uint32_t *hashes;
>> };
>> 
>> +struct bloom_keyvec {
>> + size_t count;
>> + struct bloom_key key[FLEX_ARRAY];
>> +};
>> +
> 
> A short comment would help readers understand what the intent of this
> data structure is.

Understood, will add comment for struct bloom_keyvec in v4.

> 
>> int load_bloom_filter_from_graph(struct commit_graph *g,
>> struct bloom_filter *filter,
>> uint32_t graph_pos);
>> @@ -100,6 +105,17 @@ void add_key_to_filter(const struct bloom_key *key,
>> void init_bloom_filters(void);
>> void deinit_bloom_filters(void);
>> 
>> +struct bloom_keyvec *create_bloom_keyvec(size_t count);
>> +void destroy_bloom_keyvec(struct bloom_keyvec *vec);
> 
> These functions are named very unusually for us -- the first version of
> this patch series was following our coding guidelines, but this version
> here isn't anymore.
> 
> - The primary data structure that a subsystem 'S' deals with is called
>   `struct S`. Functions that operate on `struct S` are named
>   `S_<verb>()` and should generally receive a pointer to `struct S` as
>   first parameter. E.g.
> 
> Second, the functions should probably be called `*_new()` and `*_free()`
> instead of `create_*()` and `destroy_*()`.

Though I think create and destroy doesn’t match the 'operate on struct S’
definition, I will rename these function to *_verb in v4.

> 
>> +static inline void fill_bloom_keyvec_key(const char *data, size_t len,
>> + struct bloom_keyvec *vec, size_t nr,
>> + const struct bloom_filter_settings *settings)
>> +{
>> + assert(nr < vec->count);
>> + fill_bloom_key(data, len, &vec->key[nr], settings);
>> +}
>> +
> 
> Similarly, this should probably be called `bloom_keyvec_fill_key()`.

I initially wanted the new function to have a name similar to fill_bloom_key,
but perhaps I should consider renaming it.

> 
>> enum bloom_filter_computed {
>> BLOOM_NOT_COMPUTED = (1 << 0),
>> BLOOM_COMPUTED     = (1 << 1),
>> @@ -137,4 +153,8 @@ int bloom_filter_contains(const struct bloom_filter *filter,
>>  const struct bloom_key *key,
>>  const struct bloom_filter_settings *settings);
>> 
>> +int bloom_filter_contains_vec(const struct bloom_filter *filter,
>> +      const struct bloom_keyvec *v,
>> +      const struct bloom_filter_settings *settings);
>> +
>> #endif
> 
> This one looks alright though.
> 
>> diff --git a/revision.c b/revision.c
>> index afee111196..3aa544c137 100644
>> --- a/revision.c
>> +++ b/revision.c
>> @@ -779,11 +782,8 @@ static int check_maybe_different_in_bloom_filter(struct rev_info *revs,
>> return -1;
>> }
>> 
>> - for (j = 0; result && j < revs->bloom_keys_nr; j++) {
>> - result = bloom_filter_contains(filter,
>> -       &revs->bloom_keys[j],
>> -       revs->bloom_filter_settings);
>> - }
>> + result = bloom_filter_contains_vec(filter, revs->bloom_keyvecs[0],
>> +   revs->bloom_filter_settings);
>> 
>> if (result)
>> count_bloom_filter_maybe++;
> 
> This conversion feels wrong to me. Why don't we end up iterating through
> `revs->bloom_keyvecs_nr` here?  We do indeed change it back in the next
> patch to use a for loop.

My original intention was to include all the loop-related logic in [PATCH 2/2].
However, the lack of a loop here does make the code look error-prone. I will
add the loop in v4.

> 
>> @@ -3230,10 +3230,10 @@ void release_revisions(struct rev_info *revs)
>> line_log_free(revs);
>> oidset_clear(&revs->missing_commits);
>> 
>> - for (int i = 0; i < revs->bloom_keys_nr; i++)
>> - clear_bloom_key(&revs->bloom_keys[i]);
>> - FREE_AND_NULL(revs->bloom_keys);
>> - revs->bloom_keys_nr = 0;
>> + for (int i = 0; i < revs->bloom_keyvecs_nr; i++)
> 
> It's puzzling that the number of keys is declared as `int`. It's not an
> issue introduced by you, but can we maybe fix it while at it?

Yes, of course.

Thanks for your review,
Lidong