On 7/4/2025 7:14 AM, Lidong Yan wrote: > To enable optimize multiple pathspec items in revision traversal, > return 0 if all pathspec item is literal in forbid_bloom_filters(). > Add code to initialize and check each pathspec item's bloom_keyvec. > > Add new function release_revisions_bloom_keyvecs() to free all bloom > keyvec owned by rev_info. > > Add new test cases in t/t4216-log-bloom.sh to ensure > - consistent results between the optimization for multiple pathspec > items using bloom filter and the case without bloom filter > optimization. > - does not use bloom filter if any pathspec item is not literal. This would be a great time to add some performance statistics when using this feature with multiple pathspecs on some standard repos (git and the Linux kernel repo are two good examples). We don't have a great performance script for this, since each test repo will have different paths to use for comparisons, but you can use 'hyperfine' to assemble your own comparisons before and after this change and report them here (and in your cover letter). > Signed-off-by: Lidong Yan <502024330056@xxxxxxxxxxxxxxxx> > --- > revision.c | 118 ++++++++++++++++++++++++------------------- > t/t4216-log-bloom.sh | 23 +++++---- > 2 files changed, 79 insertions(+), 62 deletions(-) > > diff --git a/revision.c b/revision.c > index 7cbb49617d..9a77c0d0bc 100644 > --- a/revision.c > +++ b/revision.c > @@ -675,16 +675,17 @@ static int forbid_bloom_filters(struct pathspec *spec) > { > if (spec->has_wildcard) > return 1; > - if (spec->nr > 1) > - return 1; > if (spec->magic & ~PATHSPEC_LITERAL) > return 1; > - if (spec->nr && (spec->items[0].magic & ~PATHSPEC_LITERAL)) > - return 1; > + for (size_t nr = 0; nr < spec->nr; nr++) > + if (spec->items[nr].magic & ~PATHSPEC_LITERAL) > + return 1; This is a good check: if any is non-literal, then we can't use bloom filters. > +static void release_revisions_bloom_keyvecs(struct rev_info *revs); > + > static void prepare_to_use_bloom_filter(struct rev_info *revs) > { > struct pathspec_item *pi; > @@ -692,7 +693,7 @@ static void prepare_to_use_bloom_filter(struct rev_info *revs) > char *path_alloc = NULL; > const char *path, *p; > size_t len; > - int path_component_nr = 1; > + int path_component_nr; We can move this into the interior of the loop, right? > if (!revs->commits) > return; > @@ -709,50 +710,52 @@ static void prepare_to_use_bloom_filter(struct rev_info *revs) > if (!revs->pruning.pathspec.nr) > return; > > - pi = &revs->pruning.pathspec.items[0]; > - > - /* remove single trailing slash from path, if needed */ > - if (pi->len > 0 && pi->match[pi->len - 1] == '/') { > - path_alloc = xmemdupz(pi->match, pi->len - 1); > - path = path_alloc; > - } else > - path = pi->match; > - > - len = strlen(path); > - if (!len) { > - revs->bloom_filter_settings = NULL; > - free(path_alloc); > - return; > - } > - > - p = path; > - while (*p) { > - /* > - * At this point, the path is normalized to use Unix-style > - * path separators. This is required due to how the > - * changed-path Bloom filters store the paths. > - */ > - if (*p == '/') > - path_component_nr++; > - p++; > - } > - > - revs->bloom_keyvecs_nr = 1; > - CALLOC_ARRAY(revs->bloom_keyvecs, 1); > - bloom_keyvec = bloom_keyvec_new(path_component_nr); > - revs->bloom_keyvecs[0] = bloom_keyvec; > - > - bloom_keyvec_fill_key(path, len, bloom_keyvec, 0, > - revs->bloom_filter_settings); > - path_component_nr = 1; The size of this diff is unfortunate. I wonder if there could first be an extraction of this logic to operate on a single pathspec and bloom_keyvec in a way that would be an obvious code move, then this patch could call that method in a loop now that we have an array of bloom_keyvecs. I think it would make a cleaner patch and a cleaner final result. > + revs->bloom_keyvecs_nr = revs->pruning.pathspec.nr; > + CALLOC_ARRAY(revs->bloom_keyvecs, revs->bloom_keyvecs_nr); > + for (int i = 0; i < revs->pruning.pathspec.nr; i++) { > + pi = &revs->pruning.pathspec.items[i]; > + path_component_nr = 1; > + > + /* remove single trailing slash from path, if needed */ > + if (pi->len > 0 && pi->match[pi->len - 1] == '/') { > + path_alloc = xmemdupz(pi->match, pi->len - 1); > + path = path_alloc; > + } else > + path = pi->match; > + > + len = strlen(path); > + if (!len) > + goto fail; > + > + p = path; > + while (*p) { > + /* > + * At this point, the path is normalized to use > + * Unix-style path separators. This is required due to > + * how the changed-path Bloom filters store the paths. > + */ > + if (*p == '/') > + path_component_nr++; > + p++; > + } > > - p = path + len - 1; > - while (p > path) { > - if (*p == '/') > - bloom_keyvec_fill_key(path, p - path, bloom_keyvec, > - path_component_nr++, > - revs->bloom_filter_settings); > - p--; > + bloom_keyvec = bloom_keyvec_new(path_component_nr); > + revs->bloom_keyvecs[i] = bloom_keyvec; > + > + bloom_keyvec_fill_key(path, len, bloom_keyvec, 0, > + revs->bloom_filter_settings); > + path_component_nr = 1; > + > + p = path + len - 1; > + while (p > path) { > + if (*p == '/') > + bloom_keyvec_fill_key(path, p - path, > + bloom_keyvec, > + path_component_nr++, > + revs->bloom_filter_settings); > + p--; > + } > + FREE_AND_NULL(path_alloc); ... > -test_expect_success 'git log -- multiple path specs does not use Bloom filters' ' > - test_bloom_filters_not_used "-- file4 A/file1" > -' > - Love to see this. > test_expect_success 'git log -- "." pathspec at root does not use Bloom filters' ' > test_bloom_filters_not_used "-- ." > ' > @@ -151,9 +148,17 @@ test_expect_success 'git log with wildcard that resolves to a single path uses B > test_bloom_filters_used "-- *renamed" > ' > > -test_expect_success 'git log with wildcard that resolves to a multiple paths does not uses Bloom filters' ' > - test_bloom_filters_not_used "-- *" && > - test_bloom_filters_not_used "-- file*" > +test_expect_success 'git log with multiple literal paths uses Bloom filter' ' > + test_bloom_filters_used "-- file4 A/file1" && > + test_bloom_filters_used "-- *" && > + test_bloom_filters_used "-- file*" > +' > + > +test_expect_success 'git log with path contains a wildcard does not use Bloom filter' ' > + test_bloom_filters_not_used "-- file\*" && > + test_bloom_filters_not_used "-- A/\* file4" && > + test_bloom_filters_not_used "-- file4 A/\*" && > + test_bloom_filters_not_used "-- * A/\*" > ' And these new test cases are great. Thanks for this work. I'm happy to see the feature be added and my suggestions are purely cosmetic as the proof is in your tests. Thanks, -Stolee