The `git-for-each-ref(1)` command is used to iterate over references present in a repository. In large repositories with millions of references, it would be optimal to paginate this output such that we can start iteration from a given reference. This would avoid having to iterate over all references from the beginning each time when paginating through results. This series adds a '--start-after' option in 'git-for-each-ref(1)'. When used, the reference iteration seeks to first reference following the marker alphabetically. When paging, it should be noted that references may be deleted, modified or added between invocations. Output will only yield those references which follow the marker lexicographically. If the marker does not exist, output begins from the first reference that would come after it alphabetically. This enables efficient pagination workflows like: git for-each-ref --count=100 git for-each-ref --count=100 --start-after=refs/heads/branch-100 git for-each-ref --count=100 --start-after=refs/heads/branch-200 To add this functionality, we expose the `ref_iterator` outside the 'refs/' namespace and modify the `ref_iterator_seek()` to actually seek to a given reference and only set the prefix when the `set_prefix` field is set. On the reftable and packed backend, the changes are simple. But since the files backend uses 'ref-cache' for reference handling, the changes there are a little more involved, since we need to setup the right levels and the indexing. Initially I was also planning to cleanup all the `refs_for_each...()` functions in 'refs.h' by simply using the iterator, but this bloated the series. So I've left that for another day. Changes in v5: - Changes to the comments to refer to the flag 'REF_ITERATOR_SEEK_SET_PREFIX' instead of a variable used in older versions. Also other small grammar fixes. - Added a commit to remove an unnecessary else clause. - Move seeking functionality within `for_each_fullref_in_pattern` to its own function. - Fix incorrect naming in the tests. - Link to v4: https://lore.kernel.org/r/20250711-306-git-for-each-ref-pagination-v4-0-ed3303ad5b89@xxxxxxxxx Changes in v4: - Patch 3/4: Move around the documentation for the flag and rename the seek variable to refname. - Patch 4/4: Cleanup the commit message and also the documentation. - Link to v3: https://lore.kernel.org/r/20250708-306-git-for-each-ref-pagination-v3-0-8cfba1080be4@xxxxxxxxx Changes in v3: - Change the working of the command to exclude the marker provided. With this rename the flag to '--start-after'. - Extend the documentation to add a note about concurrent modifications to the reference database. - Link to v2: https://lore.kernel.org/r/20250704-306-git-for-each-ref-pagination-v2-0-bcde14acdd81@xxxxxxxxx Changes in v2: - Modify 'ref_iterator_seek()' to take in flags instead of a 'set_prefix' variable. This improves readability, where users would use the 'REF_ITERATOR_SEEK_SET_PREFIX' instead of simply passing '1'. - When the set prefix flag isn't usage, reset any previously set prefix. This ensures that the internal prefix state is always reset whenever we seek and unifies the behavior between 'ref_iterator_seek' and 'ref_iterator_begin'. - Don't allow '--skip-until' to be run with '--sort', since the seeking always takes place before any sorting and this can be confusing. - Some styling fixes: - Remove extra newline - Skip braces around single lined if...else clause - Add braces around 'if' clause - Fix indentation - Link to v1: https://lore.kernel.org/git/20250701-306-git-for-each-ref-pagination-v1-0-4f0ae7c0688f@xxxxxxxxx/ Signed-off-by: Karthik Nayak <karthik.188@xxxxxxxxx> --- Documentation/git-for-each-ref.adoc | 10 +- builtin/for-each-ref.c | 8 ++ ref-filter.c | 116 ++++++++++++++------- ref-filter.h | 1 + refs.c | 6 +- refs.h | 155 ++++++++++++++++++++++++++++ refs/debug.c | 7 +- refs/files-backend.c | 7 +- refs/iterator.c | 26 +++-- refs/packed-backend.c | 17 ++-- refs/ref-cache.c | 99 ++++++++++++++---- refs/ref-cache.h | 7 -- refs/refs-internal.h | 152 ++-------------------------- refs/reftable-backend.c | 21 ++-- t/t6302-for-each-ref-filter.sh | 194 ++++++++++++++++++++++++++++++++++++ 15 files changed, 583 insertions(+), 243 deletions(-) Karthik Nayak (5): refs: expose `ref_iterator` via 'refs.h' ref-cache: remove unused function 'find_ref_entry()' refs: selectively set prefix in the seek functions ref-filter: remove unnecessary else clause for-each-ref: introduce a '--start-after' option Range-diff versus v4: 1: dde167f421 = 1: f9c9a7fdd9 refs: expose `ref_iterator` via 'refs.h' 2: e392e93520 = 2: 83bee35517 ref-cache: remove unused function 'find_ref_entry()' 3: 711ffcac00 ! 3: 3b6019a1e7 refs: selectively set prefix in the seek functions @@ refs/refs-internal.h: void base_ref_iterator_init(struct ref_iterator *iter, /* - * Seek the iterator to the first reference matching the given prefix. Should - * behave the same as if a new iterator was created with the same prefix. -+ * Seek the iterator to the first matching reference. If set_prefix is set, -+ * it would behave the same as if a new iterator was created with the same -+ * prefix. ++ * Seek the iterator to the first matching reference. If the ++ * REF_ITERATOR_SEEK_SET_PREFIX flag is set, it would behave the same as if a ++ * new iterator was created with the provided refname as prefix. */ typedef int ref_iterator_seek_fn(struct ref_iterator *ref_iterator, - const char *prefix); -: ---------- > 4: 3f89eeef26 ref-filter: remove unnecessary else clause 4: 3a0c89acbe ! 5: 7ee7d83cf0 for-each-ref: introduce a '--start-after' option @@ ref-filter.c: static int filter_exclude_match(struct ref_filter *filter, const c + strbuf_release(&sb); + return ret; +} ++ ++static int for_each_fullref_with_seek(struct ref_filter *filter, each_ref_fn cb, ++ void *cb_data, unsigned int flags) ++{ ++ struct ref_iterator *iter; ++ int ret = 0; ++ ++ iter = refs_ref_iterator_begin(get_main_ref_store(the_repository), "", ++ NULL, 0, flags); ++ if (filter->start_after) ++ ret = start_ref_iterator_after(iter, filter->start_after); ++ ++ if (ret) ++ return ret; ++ ++ return do_for_each_ref_iterator(iter, cb, cb_data); ++} + /* * This is the same as for_each_fullref_in(), but it tries to iterate * only over the patterns we'll care about. Note that it _doesn't_ do a full @@ ref-filter.c: static int for_each_fullref_in_pattern(struct ref_filter *filter, - each_ref_fn cb, - void *cb_data) { -+ struct ref_iterator *iter; -+ int flags = 0, ret = 0; -+ if (filter->kind & FILTER_REFS_ROOT_REFS) { /* In this case, we want to print all refs including root refs. */ - return refs_for_each_include_root_refs(get_main_ref_store(the_repository), - cb, cb_data); -+ flags |= DO_FOR_EACH_INCLUDE_ROOT_REFS; -+ goto non_prefix_iter; ++ return for_each_fullref_with_seek(filter, cb, cb_data, ++ DO_FOR_EACH_INCLUDE_ROOT_REFS); } if (!filter->match_as_path) { @@ ref-filter.c: static int for_each_fullref_in_pattern(struct ref_filter *filter, */ - return refs_for_each_fullref_in(get_main_ref_store(the_repository), - "", NULL, cb, cb_data); -+ goto non_prefix_iter; ++ return for_each_fullref_with_seek(filter, cb, cb_data, 0); } if (filter->ignore_case) { @@ ref-filter.c: static int for_each_fullref_in_pattern(struct ref_filter *filter, */ - return refs_for_each_fullref_in(get_main_ref_store(the_repository), - "", NULL, cb, cb_data); -+ goto non_prefix_iter; ++ return for_each_fullref_with_seek(filter, cb, cb_data, 0); } if (!filter->name_patterns[0]) { /* no patterns; we have to look at everything */ - return refs_for_each_fullref_in(get_main_ref_store(the_repository), - "", filter->exclude.v, cb, cb_data); -+ goto non_prefix_iter; ++ return for_each_fullref_with_seek(filter, cb, cb_data, 0); } return refs_for_each_fullref_in_prefixes(get_main_ref_store(the_repository), - NULL, filter->name_patterns, - filter->exclude.v, - cb, cb_data); -+ -+non_prefix_iter: -+ iter = refs_ref_iterator_begin(get_main_ref_store(the_repository), "", -+ NULL, 0, flags); -+ if (filter->start_after) -+ ret = start_ref_iterator_after(iter, filter->start_after); -+ -+ if (ret) -+ return ret; -+ -+ return do_for_each_ref_iterator(iter, cb, cb_data); - } +@@ ref-filter.c: void filter_is_base(struct repository *r, - /* -@@ ref-filter.c: static int do_filter_refs(struct ref_filter *filter, unsigned int type, each_ref - init_contains_cache(&filter->internal.no_contains_cache); + static int do_filter_refs(struct ref_filter *filter, unsigned int type, each_ref_fn fn, void *cb_data) + { ++ const char *prefix = NULL; + int ret = 0; - /* Simple per-ref filtering */ -- if (!filter->kind) -+ if (!filter->kind) { - die("filter_refs: invalid type"); -- else { -+ } else { -+ const char *prefix = NULL; -+ - /* - * For common cases where we need only branches or remotes or tags, - * we only iterate through those refs. If a mix of refs is needed, + filter->kind = type & FILTER_REFS_KIND_MASK; @@ ref-filter.c: static int do_filter_refs(struct ref_filter *filter, unsigned int type, each_ref - * of filter_ref_kind(). - */ - if (filter->kind == FILTER_REFS_BRANCHES) -- ret = refs_for_each_fullref_in(get_main_ref_store(the_repository), -- "refs/heads/", NULL, -- fn, cb_data); -+ prefix = "refs/heads/"; - else if (filter->kind == FILTER_REFS_REMOTES) -- ret = refs_for_each_fullref_in(get_main_ref_store(the_repository), -- "refs/remotes/", NULL, -- fn, cb_data); -+ prefix = "refs/remotes/"; - else if (filter->kind == FILTER_REFS_TAGS) -- ret = refs_for_each_fullref_in(get_main_ref_store(the_repository), -- "refs/tags/", NULL, fn, -- cb_data); -- else if (filter->kind & FILTER_REFS_REGULAR) -+ prefix = "refs/tags/"; + * of filter_ref_kind(). + */ + if (filter->kind == FILTER_REFS_BRANCHES) +- ret = refs_for_each_fullref_in(get_main_ref_store(the_repository), +- "refs/heads/", NULL, +- fn, cb_data); ++ prefix = "refs/heads/"; + else if (filter->kind == FILTER_REFS_REMOTES) +- ret = refs_for_each_fullref_in(get_main_ref_store(the_repository), +- "refs/remotes/", NULL, +- fn, cb_data); ++ prefix = "refs/remotes/"; + else if (filter->kind == FILTER_REFS_TAGS) +- ret = refs_for_each_fullref_in(get_main_ref_store(the_repository), +- "refs/tags/", NULL, fn, +- cb_data); +- else if (filter->kind & FILTER_REFS_REGULAR) ++ prefix = "refs/tags/"; + -+ if (prefix) { -+ struct ref_iterator *iter; ++ if (prefix) { ++ struct ref_iterator *iter; + -+ iter = refs_ref_iterator_begin(get_main_ref_store(the_repository), -+ "", NULL, 0, 0); ++ iter = refs_ref_iterator_begin(get_main_ref_store(the_repository), ++ "", NULL, 0, 0); + -+ if (filter->start_after) -+ ret = start_ref_iterator_after(iter, filter->start_after); -+ else if (prefix) -+ ret = ref_iterator_seek(iter, prefix, 1); ++ if (filter->start_after) ++ ret = start_ref_iterator_after(iter, filter->start_after); ++ else if (prefix) ++ ret = ref_iterator_seek(iter, prefix, 1); + -+ if (!ret) -+ ret = do_for_each_ref_iterator(iter, fn, cb_data); -+ } else if (filter->kind & FILTER_REFS_REGULAR) { - ret = for_each_fullref_in_pattern(filter, fn, cb_data); -+ } ++ if (!ret) ++ ret = do_for_each_ref_iterator(iter, fn, cb_data); ++ } else if (filter->kind & FILTER_REFS_REGULAR) { + ret = for_each_fullref_in_pattern(filter, fn, cb_data); ++ } - /* - * When printing all ref types, HEAD is already included, + /* + * When printing all ref types, HEAD is already included, ## ref-filter.h ## @@ ref-filter.h: struct ref_array { @@ t/t6302-for-each-ref-filter.sh: test_expect_success 'validate worktree atom' ' + refs/tags/three + refs/tags/two + EOF -+ git for-each-ref --format="%(refname)" --start-after=refs/lost >actual && ++ git for-each-ref --format="%(refname)" --start-after=refs/odd/ >actual && + test_cmp expect actual +' + @@ t/t6302-for-each-ref-filter.sh: test_expect_success 'validate worktree atom' ' + refs/tags/three + refs/tags/two + EOF -+ git for-each-ref --format="%(refname)" --start-after=refs/odd/ >actual && ++ git for-each-ref --format="%(refname)" --start-after=refs/lost >actual && + test_cmp expect actual +' + base-commit: cf6f63ea6bf35173e02e18bdc6a4ba41288acff9 change-id: 20250605-306-git-for-each-ref-pagination-0ba8a29ae646 Thanks - Karthik