On 7/4/2025 7:14 AM, Lidong Yan wrote: > The revision traversal limited by pathspec has optimization when > the pathspec has only one element. To support optimization for > multiple pathspec items, we need to modify the data structures > in struct rev_info. You are correct that the revision-walking abandons bloom filters when there are multiple pathspecs, and fixing this is a valuable effort. The need for this change is subtle and could use some extra context to be sure reviewers understand: This change is writing over some code that was created in c525ce95b46 (commit-graph: check all leading directories in changed path Bloom filters, 2020-07-01) to allow storing multiple bloom keys during the revision walk. The multiple keys are focusing on multiple path components of the literal pathspec. The point is that after the initialization, the bloom key array is used directly as a filter for reporting TREESAME commits: a commit is automatically reported as TREESAME to its first parent if any bloom key results in a "No, definitely not changed" result with that commit's bloom filter. The reason we need a new data structure is that we need to adjust the conditionals. BEFORE: "NOT TREESAME if there EXISTS a bloom key that reports NO" AFTER: "NOT TREESAME if FOR EVERY pathspec there EXISTS a bloom key that reports NO." This "FOR EVERY" condition makes it impossible to use a flat array of bloom keys for multiple pathspecs, justifying this change. What is further confusing here is that we already have logic that deals with arrays of bloom keys, so I expected that the vector was the single structure storing a list of those arrays. Instead, the vector is replacing the array itself. This is made clear by using the vector immediately in the existing implementation. > +struct bloom_keyvec *bloom_keyvec_new(size_t count) > +{ > + struct bloom_keyvec *vec; > + size_t sz = sizeof(struct bloom_keyvec); > + sz += count * sizeof(struct bloom_key); > + vec = (struct bloom_keyvec *)xcalloc(1, sz); You could use CALLOC_ARRAY() to simplify this and drop the 'sz' variable. > + vec->count = count; > + return vec; > +} > + > +void bloom_keyvec_free(struct bloom_keyvec *vec) > +{ > + if (!vec) > + return; > + for (size_t nr = 0; nr < vec->count; nr++) > + bloom_key_clear(&vec->key[nr]); > + free(vec); > +} > + > static int pathmap_cmp(const void *hashmap_cmp_fn_data UNUSED, > const struct hashmap_entry *eptr, > const struct hashmap_entry *entry_or_key, > @@ -541,6 +560,18 @@ int bloom_filter_contains(const struct bloom_filter *filter, > return 1; > } > > +int bloom_filter_contains_vec(const struct bloom_filter *filter, > + const struct bloom_keyvec *vec, > + const struct bloom_filter_settings *settings) > +{ > + int ret = 1; > + > + for (size_t nr = 0; ret > 0 && nr < vec->count; nr++) > + ret = bloom_filter_contains(filter, &vec->key[nr], settings); > + > + return ret; > +} This implementation is where the subtle detail comes in. Might be worth a comment to say "if any key in this list is not contained in the filter, then the filter doesn't match this vector."