On Sat, Jun 28, 2025 at 12:21:39PM +0800, Lidong Yan wrote: > diff --git a/bloom.h b/bloom.h > index 6e46489a20..9e4e832c8c 100644 > --- a/bloom.h > +++ b/bloom.h > @@ -74,6 +74,11 @@ struct bloom_key { > uint32_t *hashes; > }; > > +struct bloom_keyvec { > + size_t count; > + struct bloom_key key[FLEX_ARRAY]; > +}; > + A short comment would help readers understand what the intent of this data structure is. > int load_bloom_filter_from_graph(struct commit_graph *g, > struct bloom_filter *filter, > uint32_t graph_pos); > @@ -100,6 +105,17 @@ void add_key_to_filter(const struct bloom_key *key, > void init_bloom_filters(void); > void deinit_bloom_filters(void); > > +struct bloom_keyvec *create_bloom_keyvec(size_t count); > +void destroy_bloom_keyvec(struct bloom_keyvec *vec); These functions are named very unusually for us -- the first version of this patch series was following our coding guidelines, but this version here isn't anymore. - The primary data structure that a subsystem 'S' deals with is called `struct S`. Functions that operate on `struct S` are named `S_<verb>()` and should generally receive a pointer to `struct S` as first parameter. E.g. Second, the functions should probably be called `*_new()` and `*_free()` instead of `create_*()` and `destroy_*()`. > +static inline void fill_bloom_keyvec_key(const char *data, size_t len, > + struct bloom_keyvec *vec, size_t nr, > + const struct bloom_filter_settings *settings) > +{ > + assert(nr < vec->count); > + fill_bloom_key(data, len, &vec->key[nr], settings); > +} > + Similarly, this should probably be called `bloom_keyvec_fill_key()`. > enum bloom_filter_computed { > BLOOM_NOT_COMPUTED = (1 << 0), > BLOOM_COMPUTED = (1 << 1), > @@ -137,4 +153,8 @@ int bloom_filter_contains(const struct bloom_filter *filter, > const struct bloom_key *key, > const struct bloom_filter_settings *settings); > > +int bloom_filter_contains_vec(const struct bloom_filter *filter, > + const struct bloom_keyvec *v, > + const struct bloom_filter_settings *settings); > + > #endif This one looks alright though. > diff --git a/revision.c b/revision.c > index afee111196..3aa544c137 100644 > --- a/revision.c > +++ b/revision.c > @@ -779,11 +782,8 @@ static int check_maybe_different_in_bloom_filter(struct rev_info *revs, > return -1; > } > > - for (j = 0; result && j < revs->bloom_keys_nr; j++) { > - result = bloom_filter_contains(filter, > - &revs->bloom_keys[j], > - revs->bloom_filter_settings); > - } > + result = bloom_filter_contains_vec(filter, revs->bloom_keyvecs[0], > + revs->bloom_filter_settings); > > if (result) > count_bloom_filter_maybe++; This conversion feels wrong to me. Why don't we end up iterating through `revs->bloom_keyvecs_nr` here? We do indeed change it back in the next patch to use a for loop. > @@ -3230,10 +3230,10 @@ void release_revisions(struct rev_info *revs) > line_log_free(revs); > oidset_clear(&revs->missing_commits); > > - for (int i = 0; i < revs->bloom_keys_nr; i++) > - clear_bloom_key(&revs->bloom_keys[i]); > - FREE_AND_NULL(revs->bloom_keys); > - revs->bloom_keys_nr = 0; > + for (int i = 0; i < revs->bloom_keyvecs_nr; i++) It's puzzling that the number of keys is declared as `int`. It's not an issue introduced by you, but can we maybe fix it while at it? Patrick