Patrick Steinhardt <ps@xxxxxx> writes: > > On Sat, Jun 28, 2025 at 12:21:39PM +0800, Lidong Yan wrote: >> diff --git a/bloom.h b/bloom.h >> index 6e46489a20..9e4e832c8c 100644 >> --- a/bloom.h >> +++ b/bloom.h >> @@ -74,6 +74,11 @@ struct bloom_key { >> uint32_t *hashes; >> }; >> >> +struct bloom_keyvec { >> + size_t count; >> + struct bloom_key key[FLEX_ARRAY]; >> +}; >> + > > A short comment would help readers understand what the intent of this > data structure is. Understood, will add comment for struct bloom_keyvec in v4. > >> int load_bloom_filter_from_graph(struct commit_graph *g, >> struct bloom_filter *filter, >> uint32_t graph_pos); >> @@ -100,6 +105,17 @@ void add_key_to_filter(const struct bloom_key *key, >> void init_bloom_filters(void); >> void deinit_bloom_filters(void); >> >> +struct bloom_keyvec *create_bloom_keyvec(size_t count); >> +void destroy_bloom_keyvec(struct bloom_keyvec *vec); > > These functions are named very unusually for us -- the first version of > this patch series was following our coding guidelines, but this version > here isn't anymore. > > - The primary data structure that a subsystem 'S' deals with is called > `struct S`. Functions that operate on `struct S` are named > `S_<verb>()` and should generally receive a pointer to `struct S` as > first parameter. E.g. > > Second, the functions should probably be called `*_new()` and `*_free()` > instead of `create_*()` and `destroy_*()`. Though I think create and destroy doesn’t match the 'operate on struct S’ definition, I will rename these function to *_verb in v4. > >> +static inline void fill_bloom_keyvec_key(const char *data, size_t len, >> + struct bloom_keyvec *vec, size_t nr, >> + const struct bloom_filter_settings *settings) >> +{ >> + assert(nr < vec->count); >> + fill_bloom_key(data, len, &vec->key[nr], settings); >> +} >> + > > Similarly, this should probably be called `bloom_keyvec_fill_key()`. I initially wanted the new function to have a name similar to fill_bloom_key, but perhaps I should consider renaming it. > >> enum bloom_filter_computed { >> BLOOM_NOT_COMPUTED = (1 << 0), >> BLOOM_COMPUTED = (1 << 1), >> @@ -137,4 +153,8 @@ int bloom_filter_contains(const struct bloom_filter *filter, >> const struct bloom_key *key, >> const struct bloom_filter_settings *settings); >> >> +int bloom_filter_contains_vec(const struct bloom_filter *filter, >> + const struct bloom_keyvec *v, >> + const struct bloom_filter_settings *settings); >> + >> #endif > > This one looks alright though. > >> diff --git a/revision.c b/revision.c >> index afee111196..3aa544c137 100644 >> --- a/revision.c >> +++ b/revision.c >> @@ -779,11 +782,8 @@ static int check_maybe_different_in_bloom_filter(struct rev_info *revs, >> return -1; >> } >> >> - for (j = 0; result && j < revs->bloom_keys_nr; j++) { >> - result = bloom_filter_contains(filter, >> - &revs->bloom_keys[j], >> - revs->bloom_filter_settings); >> - } >> + result = bloom_filter_contains_vec(filter, revs->bloom_keyvecs[0], >> + revs->bloom_filter_settings); >> >> if (result) >> count_bloom_filter_maybe++; > > This conversion feels wrong to me. Why don't we end up iterating through > `revs->bloom_keyvecs_nr` here? We do indeed change it back in the next > patch to use a for loop. My original intention was to include all the loop-related logic in [PATCH 2/2]. However, the lack of a loop here does make the code look error-prone. I will add the loop in v4. > >> @@ -3230,10 +3230,10 @@ void release_revisions(struct rev_info *revs) >> line_log_free(revs); >> oidset_clear(&revs->missing_commits); >> >> - for (int i = 0; i < revs->bloom_keys_nr; i++) >> - clear_bloom_key(&revs->bloom_keys[i]); >> - FREE_AND_NULL(revs->bloom_keys); >> - revs->bloom_keys_nr = 0; >> + for (int i = 0; i < revs->bloom_keyvecs_nr; i++) > > It's puzzling that the number of keys is declared as `int`. It's not an > issue introduced by you, but can we maybe fix it while at it? Yes, of course. Thanks for your review, Lidong