[PATCH v3 0/4] for-each-ref: introduce seeking functionality via '--start-after'

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



The `git-for-each-ref(1)` command is used to iterate over references
present in a repository. In large repositories with millions of
references, it would be optimal to paginate this output such that we
can start iteration from a given reference. This would avoid having to
iterate over all references from the beginning each time when paginating
through results.

This series adds a '--start-after' option in 'git-for-each-ref(1)'. When
used, the reference iteration seeks to first reference following the
marker alphabetically. When paging, it should be noted that references
may be deleted, modified or added between invocations. Output will only
yield those references which follow the marker lexicographically. If the
marker does not exist, output begins from the first reference that would
come after it alphabetically.

This enables efficient pagination workflows like:
    git for-each-ref --count=100
    git for-each-ref --count=100 --start-after=refs/heads/branch-100
    git for-each-ref --count=100 --start-after=refs/heads/branch-200

To add this functionality, we expose the `ref_iterator` outside the
'refs/' namespace and modify the `ref_iterator_seek()` to actually seek
to a given reference and only set the prefix when the `set_prefix` field
is set.

On the reftable and packed backend, the changes are simple. But since
the files backend uses 'ref-cache' for reference handling, the changes
there are a little more involved, since we need to setup the right
levels and the indexing.

Initially I was also planning to cleanup all the `refs_for_each...()`
functions in 'refs.h' by simply using the iterator, but this bloated the
series. So I've left that for another day.

Changes in v3:
- Change the working of the command to exclude the marker provided. With
  this rename the flag to '--start-after'.
- Extend the documentation to add a note about concurrent modifications
  to the reference database.
- Link to v2: https://lore.kernel.org/r/20250704-306-git-for-each-ref-pagination-v2-0-bcde14acdd81@xxxxxxxxx

Changes in v2:
- Modify 'ref_iterator_seek()' to take in flags instead of a
  'set_prefix' variable. This improves readability, where users would
  use the 'REF_ITERATOR_SEEK_SET_PREFIX' instead of simply passing '1'.
- When the set prefix flag isn't usage, reset any previously set prefix.
  This ensures that the internal prefix state is always reset whenever
  we seek and unifies the behavior between 'ref_iterator_seek' and
  'ref_iterator_begin'.
- Don't allow '--skip-until' to be run with '--sort', since the seeking
  always takes place before any sorting and this can be confusing.
- Some styling fixes:
  - Remove extra newline
  - Skip braces around single lined if...else clause
  - Add braces around 'if' clause
  - Fix indentation
- Link to v1: https://lore.kernel.org/git/20250701-306-git-for-each-ref-pagination-v1-0-4f0ae7c0688f@xxxxxxxxx/

Signed-off-by: Karthik Nayak <karthik.188@xxxxxxxxx>
---
 Documentation/git-for-each-ref.adoc |  11 +-
 builtin/for-each-ref.c              |   8 ++
 ref-filter.c                        |  80 +++++++++++----
 ref-filter.h                        |   1 +
 refs.c                              |   6 +-
 refs.h                              | 158 +++++++++++++++++++++++++++++
 refs/debug.c                        |   7 +-
 refs/files-backend.c                |   7 +-
 refs/iterator.c                     |  26 +++--
 refs/packed-backend.c               |  17 ++--
 refs/ref-cache.c                    |  99 ++++++++++++++----
 refs/ref-cache.h                    |   7 --
 refs/refs-internal.h                | 152 ++--------------------------
 refs/reftable-backend.c             |  21 ++--
 t/t6302-for-each-ref-filter.sh      | 194 ++++++++++++++++++++++++++++++++++++
 15 files changed, 568 insertions(+), 226 deletions(-)

Karthik Nayak (4):
      refs: expose `ref_iterator` via 'refs.h'
      ref-cache: remove unused function 'find_ref_entry()'
      refs: selectively set prefix in the seek functions
      for-each-ref: introduce a '--start-after' option

Range-diff versus v2:

1:  c0ce873c35 = 1:  dbb03c2aa9 refs: expose `ref_iterator` via 'refs.h'
2:  2c50d1eba2 = 2:  fa5a0cb722 ref-cache: remove unused function 'find_ref_entry()'
3:  fae849749f = 3:  9940d390cc refs: selectively set prefix in the seek functions
4:  a0725a6647 ! 4:  ebe864095a for-each-ref: introduce a '--skip-until' option
    @@ Metadata
     Author: Karthik Nayak <karthik.188@xxxxxxxxx>
     
      ## Commit message ##
    -    for-each-ref: introduce a '--skip-until' option
    +    for-each-ref: introduce a '--start-after' option
     
         The `git-for-each-ref(1)` command is used to iterate over references
         present in a repository. In large repositories with millions of
    @@ Commit message
         through results.
     
         The previous commit added 'seek' functionality to the reference
    -    backends. Utilize this and expose a '--skip-until' option in
    +    backends. Utilize this and expose a '--start-after' option in
         'git-for-each-ref(1)'. When used, the reference iteration seeks to the
    -    first matching reference and iterates from there onward.
    +    lexicographically next reference and iterates from there onward.
     
         This enables efficient pagination workflows like:
             git for-each-ref --count=100
    -        git for-each-ref --count=100 --skip-until=refs/heads/branch-100
    -        git for-each-ref --count=100 --skip-until=refs/heads/branch-200
    +        git for-each-ref --count=100 --start-after=refs/heads/branch-100
    +        git for-each-ref --count=100 --start-after=refs/heads/branch-200
    +
    +    Since the reference iterators only allow seeking to a specified marker
    +    via the `ref_iterator_seek()`, we introduce a helper function
    +    `start_ref_iterator_after()`, which seeks to next reference by simply
    +    adding (char) 1 to the marker.
    +
    +    We must note that pagination always continues from the provided marker,
    +    as such any concurrent reference updates lexicographically behind the
    +    marker will not be output. Document the same.
     
         Signed-off-by: Karthik Nayak <karthik.188@xxxxxxxxx>
     
    @@ Documentation/git-for-each-ref.adoc: SYNOPSIS
      		   [--merged[=<object>]] [--no-merged[=<object>]]
      		   [--contains[=<object>]] [--no-contains[=<object>]]
     -		   [--exclude=<pattern> ...]
    -+		   [--exclude=<pattern> ...] [--skip-until=<pattern>]
    ++		   [--exclude=<pattern> ...] [--start-after=<marker>]
      
      DESCRIPTION
      -----------
    @@ Documentation/git-for-each-ref.adoc: TAB %(refname)`.
      --include-root-refs::
      	List root refs (HEAD and pseudorefs) apart from regular refs.
      
    -+--skip-until::
    -+    Skip references up to but excluding the specified pattern. Cannot be used
    -+    with general pattern matching or custom sort options.
    ++--start-after::
    ++    Allows paginating the output by skipping references up to and including the
    ++    specified marker. When paging, it should be noted that references may be
    ++    deleted, modified or added between invocations. Output will only yield those
    ++    references which follow the marker lexicographically. If the marker does not
    ++    exist, output begins from the first reference that would come after it
    ++    alphabetically. Cannot be used with general pattern matching or custom
    ++    sort options.
     +
      FIELD NAMES
      -----------
    @@ builtin/for-each-ref.c: static char const * const for_each_ref_usage[] = {
      	N_("git for-each-ref [--points-at <object>]"),
      	N_("git for-each-ref [--merged [<commit>]] [--no-merged [<commit>]]"),
      	N_("git for-each-ref [--contains [<commit>]] [--no-contains [<commit>]]"),
    -+	N_("git for-each-ref [--skip-until <pattern>]"),
    ++	N_("git for-each-ref [--start-after <marker>]"),
      	NULL
      };
      
    @@ builtin/for-each-ref.c: int cmd_for_each_ref(int argc,
      		OPT_GROUP(""),
      		OPT_INTEGER( 0 , "count", &format.array_opts.max_count, N_("show only <n> matched refs")),
      		OPT_STRING(  0 , "format", &format.format, N_("format"), N_("format to use for the output")),
    -+		OPT_STRING(  0 , "skip-until", &filter.seek, N_("skip-until"), N_("skip references until")),
    ++		OPT_STRING(  0 , "start-after", &filter.start_after, N_("start-start"), N_("start iteration after the provided marker")),
      		OPT__COLOR(&format.use_color, N_("respect format colors")),
      		OPT_REF_FILTER_EXCLUDE(&filter),
      		OPT_REF_SORT(&sorting_options),
    @@ builtin/for-each-ref.c: int cmd_for_each_ref(int argc,
      	if (verify_ref_format(&format))
      		usage_with_options(for_each_ref_usage, opts);
      
    -+	if (filter.seek && sorting_options.nr > 1)
    -+		die(_("cannot use --skip-until custom sort options"));
    ++	if (filter.start_after && sorting_options.nr > 1)
    ++		die(_("cannot use --start-after with custom sort options"));
     +
      	sorting = ref_sorting_options(&sorting_options);
      	ref_sorting_set_sort_flags_all(sorting, REF_SORTING_ICASE, icase);
    @@ builtin/for-each-ref.c: int cmd_for_each_ref(int argc,
      		filter.name_patterns = argv;
      	}
      
    -+	if (filter.seek && filter.name_patterns && filter.name_patterns[0])
    -+		die(_("cannot use --skip-until with patterns"));
    ++	if (filter.start_after && filter.name_patterns && filter.name_patterns[0])
    ++		die(_("cannot use --start-after with patterns"));
     +
      	if (include_root_refs)
      		flags |= FILTER_REFS_ROOT_REFS | FILTER_REFS_DETACHED_HEAD;
      
     
      ## ref-filter.c ##
    +@@ ref-filter.c: static int filter_exclude_match(struct ref_filter *filter, const char *refname)
    + 	return match_pattern(filter->exclude.v, refname, filter->ignore_case);
    + }
    + 
    ++/*
    ++ * We need to seek to the reference right after a given marker but excluding any
    ++ * matching references. So we seek to the lexicographically next reference.
    ++ */
    ++static int start_ref_iterator_after(struct ref_iterator *iter, const char *marker)
    ++{
    ++	struct strbuf sb = STRBUF_INIT;
    ++	int ret;
    ++
    ++	strbuf_addstr(&sb, marker);
    ++	strbuf_addch(&sb, 1);
    ++
    ++	ret = ref_iterator_seek(iter, sb.buf, 0);
    ++
    ++	strbuf_release(&sb);
    ++	return ret;
    ++}
    ++
    + /*
    +  * This is the same as for_each_fullref_in(), but it tries to iterate
    +  * only over the patterns we'll care about. Note that it _doesn't_ do a full
     @@ ref-filter.c: static int for_each_fullref_in_pattern(struct ref_filter *filter,
      				       each_ref_fn cb,
      				       void *cb_data)
    @@ ref-filter.c: static int for_each_fullref_in_pattern(struct ref_filter *filter,
     +non_prefix_iter:
     +	iter = refs_ref_iterator_begin(get_main_ref_store(the_repository), "",
     +				       NULL, 0, flags);
    -+	if (filter->seek)
    -+		ret = ref_iterator_seek(iter, filter->seek, 0);
    ++	if (filter->start_after)
    ++		ret = start_ref_iterator_after(iter, filter->start_after);
    ++
     +	if (ret)
     +		return ret;
     +
    @@ ref-filter.c: static int do_filter_refs(struct ref_filter *filter, unsigned int
     +			iter = refs_ref_iterator_begin(get_main_ref_store(the_repository),
     +						       "", NULL, 0, 0);
     +
    -+			if (filter->seek)
    -+				ret = ref_iterator_seek(iter, filter->seek, 0);
    ++			if (filter->start_after)
    ++				ret = start_ref_iterator_after(iter, filter->start_after);
     +			else if (prefix)
     +				ret = ref_iterator_seek(iter, prefix, 1);
     +
    @@ ref-filter.h: struct ref_array {
      
      struct ref_filter {
      	const char **name_patterns;
    -+	const char *seek;
    ++	const char *start_after;
      	struct strvec exclude;
      	struct oid_array points_at;
      	struct commit_list *with_commit;
    @@ t/t6302-for-each-ref-filter.sh: test_expect_success 'validate worktree atom' '
      	test_cmp expect actual
      '
      
    -+test_expect_success 'skip until with empty value' '
    ++test_expect_success 'start after with empty value' '
     +	cat >expect <<-\EOF &&
     +	refs/heads/main
     +	refs/heads/main_worktree
    @@ t/t6302-for-each-ref-filter.sh: test_expect_success 'validate worktree atom' '
     +	refs/tags/three
     +	refs/tags/two
     +	EOF
    -+	git for-each-ref --format="%(refname)" --skip-until="" >actual &&
    ++	git for-each-ref --format="%(refname)" --start-after="" >actual &&
     +	test_cmp expect actual
     +'
     +
    -+test_expect_success 'skip until to a specific reference' '
    ++test_expect_success 'start after a specific reference' '
     +	cat >expect <<-\EOF &&
    -+	refs/odd/spot
     +	refs/tags/annotated-tag
     +	refs/tags/doubly-annotated-tag
     +	refs/tags/doubly-signed-tag
    @@ t/t6302-for-each-ref-filter.sh: test_expect_success 'validate worktree atom' '
     +	refs/tags/three
     +	refs/tags/two
     +	EOF
    -+	git for-each-ref --format="%(refname)" --skip-until=refs/odd/spot >actual &&
    ++	git for-each-ref --format="%(refname)" --start-after=refs/odd/spot >actual &&
     +	test_cmp expect actual
     +'
     +
    -+test_expect_success 'skip until to a specific reference with partial match' '
    ++test_expect_success 'start after a specific reference with partial match' '
     +	cat >expect <<-\EOF &&
     +	refs/odd/spot
     +	refs/tags/annotated-tag
    @@ t/t6302-for-each-ref-filter.sh: test_expect_success 'validate worktree atom' '
     +	refs/tags/three
     +	refs/tags/two
     +	EOF
    -+	git for-each-ref --format="%(refname)" --skip-until=refs/odd/sp >actual &&
    ++	git for-each-ref --format="%(refname)" --start-after=refs/odd/sp >actual &&
     +	test_cmp expect actual
     +'
     +
    -+test_expect_success 'skip until just behind a specific reference' '
    ++test_expect_success 'start after, just behind a specific reference' '
     +	cat >expect <<-\EOF &&
     +	refs/odd/spot
     +	refs/tags/annotated-tag
    @@ t/t6302-for-each-ref-filter.sh: test_expect_success 'validate worktree atom' '
     +	refs/tags/three
     +	refs/tags/two
     +	EOF
    -+	git for-each-ref --format="%(refname)" --skip-until=refs/odd/parrot >actual &&
    ++	git for-each-ref --format="%(refname)" --start-after=refs/odd/parrot >actual &&
     +	test_cmp expect actual
     +'
     +
    -+test_expect_success 'skip until to specific directory' '
    ++test_expect_success 'start after with specific directory match' '
     +	cat >expect <<-\EOF &&
     +	refs/odd/spot
     +	refs/tags/annotated-tag
    @@ t/t6302-for-each-ref-filter.sh: test_expect_success 'validate worktree atom' '
     +	refs/tags/three
     +	refs/tags/two
     +	EOF
    -+	git for-each-ref --format="%(refname)" --skip-until=refs/odd >actual &&
    ++	git for-each-ref --format="%(refname)" --start-after=refs/odd >actual &&
     +	test_cmp expect actual
     +'
     +
    -+test_expect_success 'skip until to specific directory with trailing slash' '
    ++test_expect_success 'start after with specific directory and trailing slash' '
     +	cat >expect <<-\EOF &&
     +	refs/odd/spot
     +	refs/tags/annotated-tag
    @@ t/t6302-for-each-ref-filter.sh: test_expect_success 'validate worktree atom' '
     +	refs/tags/three
     +	refs/tags/two
     +	EOF
    -+	git for-each-ref --format="%(refname)" --skip-until=refs/lost >actual &&
    ++	git for-each-ref --format="%(refname)" --start-after=refs/lost >actual &&
     +	test_cmp expect actual
     +'
     +
    -+test_expect_success 'skip until just behind a specific directory' '
    ++test_expect_success 'start after, just behind a specific directory' '
     +	cat >expect <<-\EOF &&
     +	refs/odd/spot
     +	refs/tags/annotated-tag
    @@ t/t6302-for-each-ref-filter.sh: test_expect_success 'validate worktree atom' '
     +	refs/tags/three
     +	refs/tags/two
     +	EOF
    -+	git for-each-ref --format="%(refname)" --skip-until=refs/odd/ >actual &&
    ++	git for-each-ref --format="%(refname)" --start-after=refs/odd/ >actual &&
     +	test_cmp expect actual
     +'
     +
    -+test_expect_success 'skip until overflow specific reference length' '
    ++test_expect_success 'start after, overflow specific reference length' '
     +	cat >expect <<-\EOF &&
     +	refs/tags/annotated-tag
     +	refs/tags/doubly-annotated-tag
    @@ t/t6302-for-each-ref-filter.sh: test_expect_success 'validate worktree atom' '
     +	refs/tags/three
     +	refs/tags/two
     +	EOF
    -+	git for-each-ref --format="%(refname)" --skip-until=refs/odd/spotnew >actual &&
    ++	git for-each-ref --format="%(refname)" --start-after=refs/odd/spotnew >actual &&
     +	test_cmp expect actual
     +'
     +
    -+test_expect_success 'skip until overflow specific reference path' '
    ++test_expect_success 'start after, overflow specific reference path' '
     +	cat >expect <<-\EOF &&
     +	refs/tags/annotated-tag
     +	refs/tags/doubly-annotated-tag
    @@ t/t6302-for-each-ref-filter.sh: test_expect_success 'validate worktree atom' '
     +	refs/tags/three
     +	refs/tags/two
     +	EOF
    -+	git for-each-ref --format="%(refname)" --skip-until=refs/odd/spot/new >actual &&
    ++	git for-each-ref --format="%(refname)" --start-after=refs/odd/spot/new >actual &&
    ++	test_cmp expect actual
    ++'
    ++
    ++test_expect_success 'start after, last reference' '
    ++	cat >expect <<-\EOF &&
    ++	EOF
    ++	git for-each-ref --format="%(refname)" --start-after=refs/tags/two >actual &&
     +	test_cmp expect actual
     +'
     +
    -+test_expect_success 'skip until used with a pattern' '
    ++test_expect_success 'start after used with a pattern' '
     +	cat >expect <<-\EOF &&
    -+	fatal: cannot use --skip-until with patterns
    ++	fatal: cannot use --start-after with patterns
     +	EOF
    -+	test_must_fail git for-each-ref --format="%(refname)" --skip-until=refs/odd/spot refs/tags 2>actual &&
    ++	test_must_fail git for-each-ref --format="%(refname)" --start-after=refs/odd/spot refs/tags 2>actual &&
     +	test_cmp expect actual
     +'
     +
    -+test_expect_success 'skip until used with custom sort order' '
    ++test_expect_success 'start after used with custom sort order' '
     +	cat >expect <<-\EOF &&
    -+	fatal: cannot use --skip-until custom sort options
    ++	fatal: cannot use --start-after with custom sort options
     +	EOF
    -+	test_must_fail git for-each-ref --format="%(refname)" --skip-until=refs/odd/spot --sort=author 2>actual &&
    ++	test_must_fail git for-each-ref --format="%(refname)" --start-after=refs/odd/spot --sort=author 2>actual &&
     +	test_cmp expect actual
     +'
     +


base-commit: cf6f63ea6bf35173e02e18bdc6a4ba41288acff9
change-id: 20250605-306-git-for-each-ref-pagination-0ba8a29ae646

Thanks
- Karthik





[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux