[PATCH v4 0/4] for-each-ref: introduce seeking functionality via '--start-after'

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



The `git-for-each-ref(1)` command is used to iterate over references
present in a repository. In large repositories with millions of
references, it would be optimal to paginate this output such that we
can start iteration from a given reference. This would avoid having to
iterate over all references from the beginning each time when paginating
through results.

This series adds a '--start-after' option in 'git-for-each-ref(1)'. When
used, the reference iteration seeks to first reference following the
marker alphabetically. When paging, it should be noted that references
may be deleted, modified or added between invocations. Output will only
yield those references which follow the marker lexicographically. If the
marker does not exist, output begins from the first reference that would
come after it alphabetically.

This enables efficient pagination workflows like:
    git for-each-ref --count=100
    git for-each-ref --count=100 --start-after=refs/heads/branch-100
    git for-each-ref --count=100 --start-after=refs/heads/branch-200

To add this functionality, we expose the `ref_iterator` outside the
'refs/' namespace and modify the `ref_iterator_seek()` to actually seek
to a given reference and only set the prefix when the `set_prefix` field
is set.

On the reftable and packed backend, the changes are simple. But since
the files backend uses 'ref-cache' for reference handling, the changes
there are a little more involved, since we need to setup the right
levels and the indexing.

Initially I was also planning to cleanup all the `refs_for_each...()`
functions in 'refs.h' by simply using the iterator, but this bloated the
series. So I've left that for another day.

Changes in v4:
- Patch 3/4: Move around the documentation for the flag and rename the
  seek variable to refname.
- Patch 4/4: Cleanup the commit message and also the documentation.
- Link to v3: https://lore.kernel.org/r/20250708-306-git-for-each-ref-pagination-v3-0-8cfba1080be4@xxxxxxxxx

Changes in v3:
- Change the working of the command to exclude the marker provided. With
  this rename the flag to '--start-after'.
- Extend the documentation to add a note about concurrent modifications
  to the reference database.
- Link to v2: https://lore.kernel.org/r/20250704-306-git-for-each-ref-pagination-v2-0-bcde14acdd81@xxxxxxxxx

Changes in v2:
- Modify 'ref_iterator_seek()' to take in flags instead of a
  'set_prefix' variable. This improves readability, where users would
  use the 'REF_ITERATOR_SEEK_SET_PREFIX' instead of simply passing '1'.
- When the set prefix flag isn't usage, reset any previously set prefix.
  This ensures that the internal prefix state is always reset whenever
  we seek and unifies the behavior between 'ref_iterator_seek' and
  'ref_iterator_begin'.
- Don't allow '--skip-until' to be run with '--sort', since the seeking
  always takes place before any sorting and this can be confusing.
- Some styling fixes:
  - Remove extra newline
  - Skip braces around single lined if...else clause
  - Add braces around 'if' clause
  - Fix indentation
- Link to v1: https://lore.kernel.org/git/20250701-306-git-for-each-ref-pagination-v1-0-4f0ae7c0688f@xxxxxxxxx/

Signed-off-by: Karthik Nayak <karthik.188@xxxxxxxxx>
---
 Documentation/git-for-each-ref.adoc |  10 +-
 builtin/for-each-ref.c              |   8 ++
 ref-filter.c                        |  80 +++++++++++----
 ref-filter.h                        |   1 +
 refs.c                              |   6 +-
 refs.h                              | 155 ++++++++++++++++++++++++++++
 refs/debug.c                        |   7 +-
 refs/files-backend.c                |   7 +-
 refs/iterator.c                     |  26 +++--
 refs/packed-backend.c               |  17 ++--
 refs/ref-cache.c                    |  99 ++++++++++++++----
 refs/ref-cache.h                    |   7 --
 refs/refs-internal.h                | 152 ++--------------------------
 refs/reftable-backend.c             |  21 ++--
 t/t6302-for-each-ref-filter.sh      | 194 ++++++++++++++++++++++++++++++++++++
 15 files changed, 564 insertions(+), 226 deletions(-)

Karthik Nayak (4):
      refs: expose `ref_iterator` via 'refs.h'
      ref-cache: remove unused function 'find_ref_entry()'
      refs: selectively set prefix in the seek functions
      for-each-ref: introduce a '--start-after' option

Range-diff versus v3:

1:  eed39162f5 = 1:  9e6ecff291 refs: expose `ref_iterator` via 'refs.h'
2:  b9db49d31b = 2:  22f5222e4f ref-cache: remove unused function 'find_ref_entry()'
3:  502e2696fd ! 3:  0e71d8ffd9 refs: selectively set prefix in the seek functions
    @@ refs.h: struct ref_iterator *refs_ref_iterator_begin(
      
     +enum ref_iterator_seek_flag {
     +	/*
    -+	 * Also set the seek pattern as a prefix for iteration. This ensures
    -+	 * that only references which match the prefix are yielded.
    ++	 * When the REF_ITERATOR_SEEK_SET_PREFIX flag is set, the iterator's prefix is
    ++	 * updated to match the provided string, affecting all subsequent iterations. If
    ++	 * not, the iterator seeks to the specified reference and clears any previously
    ++	 * set prefix.
     +	 */
     +	REF_ITERATOR_SEEK_SET_PREFIX = (1 << 0),
     +};
    @@ refs.h: struct ref_iterator *refs_ref_iterator_begin(
     - * passed when creating the iterator will remain unchanged.
     + * This function is expected to behave as if a new ref iterator has been
     + * created, but allows reuse of existing iterators for optimization.
    -+ *
    -+ * When the REF_ITERATOR_SEEK_SET_PREFIX flag is set, the iterator's prefix is
    -+ * updated to match the seek string, affecting all subsequent iterations. If
    -+ * not, the iterator seeks to the specified reference and clears any previously
    -+ * set prefix.
       *
       * Returns 0 on success, a negative error code otherwise.
       */
     -int ref_iterator_seek(struct ref_iterator *ref_iterator,
     -		      const char *prefix);
    -+int ref_iterator_seek(struct ref_iterator *ref_iterator, const char *seek,
    ++int ref_iterator_seek(struct ref_iterator *ref_iterator, const char *refname,
     +		      unsigned int flags);
      
      /*
    @@ refs/debug.c: static int debug_ref_iterator_advance(struct ref_iterator *ref_ite
      
      static int debug_ref_iterator_seek(struct ref_iterator *ref_iterator,
     -				   const char *prefix)
    -+				   const char *seek, unsigned int flags)
    ++				   const char *refname, unsigned int flags)
      {
      	struct debug_ref_iterator *diter =
      		(struct debug_ref_iterator *)ref_iterator;
     -	int res = diter->iter->vtable->seek(diter->iter, prefix);
     -	trace_printf_key(&trace_refs, "iterator_seek: %s: %d\n", prefix ? prefix : "", res);
    -+	int res = diter->iter->vtable->seek(diter->iter, seek, flags);
    ++	int res = diter->iter->vtable->seek(diter->iter, refname, flags);
     +	trace_printf_key(&trace_refs, "iterator_seek: %s flags: %d: %d\n",
    -+			 seek ? seek : "", flags, res);
    ++			 refname ? refname : "", flags, res);
      	return res;
      }
      
    @@ refs/files-backend.c: static int files_ref_iterator_advance(struct ref_iterator
      
      static int files_ref_iterator_seek(struct ref_iterator *ref_iterator,
     -				   const char *prefix)
    -+				   const char *seek, unsigned int flags)
    ++				   const char *refname, unsigned int flags)
      {
      	struct files_ref_iterator *iter =
      		(struct files_ref_iterator *)ref_iterator;
     -	return ref_iterator_seek(iter->iter0, prefix);
    -+	return ref_iterator_seek(iter->iter0, seek, flags);
    ++	return ref_iterator_seek(iter->iter0, refname, flags);
      }
      
      static int files_ref_iterator_peel(struct ref_iterator *ref_iterator,
    @@ refs/files-backend.c: static int files_reflog_iterator_advance(struct ref_iterat
      
      static int files_reflog_iterator_seek(struct ref_iterator *ref_iterator UNUSED,
     -				      const char *prefix UNUSED)
    -+				      const char *seek UNUSED,
    ++				      const char *refname UNUSED,
     +				      unsigned int flags UNUSED)
      {
      	BUG("ref_iterator_seek() called for reflog_iterator");
    @@ refs/iterator.c: int ref_iterator_advance(struct ref_iterator *ref_iterator)
      
     -int ref_iterator_seek(struct ref_iterator *ref_iterator,
     -		      const char *prefix)
    -+int ref_iterator_seek(struct ref_iterator *ref_iterator, const char *seek,
    ++int ref_iterator_seek(struct ref_iterator *ref_iterator, const char *refname,
     +		      unsigned int flags)
      {
     -	return ref_iterator->vtable->seek(ref_iterator, prefix);
    -+	return ref_iterator->vtable->seek(ref_iterator, seek, flags);
    ++	return ref_iterator->vtable->seek(ref_iterator, refname, flags);
      }
      
      int ref_iterator_peel(struct ref_iterator *ref_iterator,
    @@ refs/iterator.c: static int empty_ref_iterator_advance(struct ref_iterator *ref_
      
      static int empty_ref_iterator_seek(struct ref_iterator *ref_iterator UNUSED,
     -				   const char *prefix UNUSED)
    -+				   const char *seek UNUSED,
    ++				   const char *refname UNUSED,
     +				   unsigned int flags UNUSED)
      {
      	return 0;
    @@ refs/iterator.c: static int merge_ref_iterator_advance(struct ref_iterator *ref_
      
      static int merge_ref_iterator_seek(struct ref_iterator *ref_iterator,
     -				   const char *prefix)
    -+				   const char *seek, unsigned int flags)
    ++				   const char *refname, unsigned int flags)
      {
      	struct merge_ref_iterator *iter =
      		(struct merge_ref_iterator *)ref_iterator;
    @@ refs/iterator.c: static int merge_ref_iterator_seek(struct ref_iterator *ref_ite
      	iter->iter1 = iter->iter1_owned;
      
     -	ret = ref_iterator_seek(iter->iter0, prefix);
    -+	ret = ref_iterator_seek(iter->iter0, seek, flags);
    ++	ret = ref_iterator_seek(iter->iter0, refname, flags);
      	if (ret < 0)
      		return ret;
      
     -	ret = ref_iterator_seek(iter->iter1, prefix);
    -+	ret = ref_iterator_seek(iter->iter1, seek, flags);
    ++	ret = ref_iterator_seek(iter->iter1, refname, flags);
      	if (ret < 0)
      		return ret;
      
    @@ refs/iterator.c: static int prefix_ref_iterator_advance(struct ref_iterator *ref
      
      static int prefix_ref_iterator_seek(struct ref_iterator *ref_iterator,
     -				    const char *prefix)
    -+				    const char *seek, unsigned int flags)
    ++				    const char *refname, unsigned int flags)
      {
      	struct prefix_ref_iterator *iter =
      		(struct prefix_ref_iterator *)ref_iterator;
    @@ refs/iterator.c: static int prefix_ref_iterator_advance(struct ref_iterator *ref
     +
     +	if (flags & REF_ITERATOR_SEEK_SET_PREFIX) {
     +		free(iter->prefix);
    -+		iter->prefix = xstrdup_or_null(seek);
    ++		iter->prefix = xstrdup_or_null(refname);
     +	}
    -+	return ref_iterator_seek(iter->iter0, seek, flags);
    ++	return ref_iterator_seek(iter->iter0, refname, flags);
      }
      
      static int prefix_ref_iterator_peel(struct ref_iterator *ref_iterator,
    @@ refs/packed-backend.c: static int packed_ref_iterator_advance(struct ref_iterato
      
      static int packed_ref_iterator_seek(struct ref_iterator *ref_iterator,
     -				    const char *prefix)
    -+				    const char *seek, unsigned int flags)
    ++				    const char *refname, unsigned int flags)
      {
      	struct packed_ref_iterator *iter =
      		(struct packed_ref_iterator *)ref_iterator;
    @@ refs/packed-backend.c: static int packed_ref_iterator_advance(struct ref_iterato
      
     -	if (prefix && *prefix)
     -		start = find_reference_location(iter->snapshot, prefix, 0);
    -+	if (seek && *seek)
    -+		start = find_reference_location(iter->snapshot, seek, 0);
    ++	if (refname && *refname)
    ++		start = find_reference_location(iter->snapshot, refname, 0);
      	else
      		start = iter->snapshot->start;
      
    @@ refs/packed-backend.c: static int packed_ref_iterator_advance(struct ref_iterato
     +	FREE_AND_NULL(iter->prefix);
     +
     +	if (flags & REF_ITERATOR_SEEK_SET_PREFIX)
    -+		iter->prefix = xstrdup_or_null(seek);
    ++		iter->prefix = xstrdup_or_null(refname);
     +
      	iter->pos = start;
      	iter->eof = iter->snapshot->eof;
    @@ refs/ref-cache.c: static int cache_ref_iterator_seek(struct ref_iterator *ref_it
      }
      
     +static int cache_ref_iterator_seek(struct ref_iterator *ref_iterator,
    -+				   const char *seek, unsigned int flags)
    ++				   const char *refname, unsigned int flags)
     +{
     +	struct cache_ref_iterator *iter =
     +		(struct cache_ref_iterator *)ref_iterator;
     +
     +	if (flags & REF_ITERATOR_SEEK_SET_PREFIX) {
    -+		return cache_ref_iterator_set_prefix(iter, seek);
    -+	} else if (seek && *seek) {
    ++		return cache_ref_iterator_set_prefix(iter, refname);
    ++	} else if (refname && *refname) {
     +		struct cache_ref_iterator_level *level;
    -+		const char *slash = seek;
    ++		const char *slash = refname;
     +		struct ref_dir *dir;
     +
     +		dir = get_ref_dir(iter->cache->root);
     +
     +		if (iter->prime_dir)
    -+			prime_ref_dir(dir, seek);
    ++			prime_ref_dir(dir, refname);
     +
     +		iter->levels_nr = 1;
     +		level = &iter->levels[0];
    @@ refs/ref-cache.c: static int cache_ref_iterator_seek(struct ref_iterator *ref_it
     +			sort_ref_dir(dir);
     +
     +			slash = strchr(slash, '/');
    -+			len = slash ? slash - seek : (int)strlen(seek);
    ++			len = slash ? slash - refname : (int)strlen(refname);
     +
     +			for (idx = 0; idx < dir->nr; idx++) {
    -+				cmp = strncmp(seek, dir->entries[idx]->name, len);
    ++				cmp = strncmp(refname, dir->entries[idx]->name, len);
     +				if (cmp <= 0)
     +					break;
     +			}
    @@ refs/refs-internal.h: void base_ref_iterator_init(struct ref_iterator *iter,
       */
      typedef int ref_iterator_seek_fn(struct ref_iterator *ref_iterator,
     -				 const char *prefix);
    -+				 const char *seek, unsigned int flags);
    ++				 const char *refname, unsigned int flags);
      
      /*
       * Peels the current ref, returning 0 for success or -1 for failure.
    @@ refs/reftable-backend.c: static int reftable_ref_iterator_advance(struct ref_ite
      
      static int reftable_ref_iterator_seek(struct ref_iterator *ref_iterator,
     -				      const char *prefix)
    -+				      const char *seek, unsigned int flags)
    ++				      const char *refname, unsigned int flags)
      {
      	struct reftable_ref_iterator *iter =
      		(struct reftable_ref_iterator *)ref_iterator;
    @@ refs/reftable-backend.c: static int reftable_ref_iterator_advance(struct ref_ite
     +	iter->prefix_len = 0;
     +
     +	if (flags & REF_ITERATOR_SEEK_SET_PREFIX) {
    -+		iter->prefix = xstrdup_or_null(seek);
    -+		iter->prefix_len = seek ? strlen(seek) : 0;
    ++		iter->prefix = xstrdup_or_null(refname);
    ++		iter->prefix_len = refname ? strlen(refname) : 0;
     +	}
    -+	iter->err = reftable_iterator_seek_ref(&iter->iter, seek);
    ++	iter->err = reftable_iterator_seek_ref(&iter->iter, refname);
      
      	return iter->err;
      }
    @@ refs/reftable-backend.c: static int reftable_reflog_iterator_advance(struct ref_
      
      static int reftable_reflog_iterator_seek(struct ref_iterator *ref_iterator UNUSED,
     -					 const char *prefix UNUSED)
    -+					 const char *seek UNUSED,
    ++					 const char *refname UNUSED,
     +					 unsigned int flags UNUSED)
      {
      	BUG("reftable reflog iterator cannot be seeked");
4:  a571579886 ! 4:  e4e9dddd15 for-each-ref: introduce a '--start-after' option
    @@ Commit message
         'git-for-each-ref(1)'. When used, the reference iteration seeks to the
         lexicographically next reference and iterates from there onward.
     
    -    This enables efficient pagination workflows like:
    +    This enables efficient pagination workflows, where the calling script
    +    can remember the last provided reference and use that as the starting
    +    point for the next set of references:
             git for-each-ref --count=100
             git for-each-ref --count=100 --start-after=refs/heads/branch-100
             git for-each-ref --count=100 --start-after=refs/heads/branch-200
    @@ Documentation/git-for-each-ref.adoc: TAB %(refname)`.
      --include-root-refs::
      	List root refs (HEAD and pseudorefs) apart from regular refs.
      
    -+--start-after::
    ++--start-after=<marker>::
     +    Allows paginating the output by skipping references up to and including the
     +    specified marker. When paging, it should be noted that references may be
     +    deleted, modified or added between invocations. Output will only yield those
    -+    references which follow the marker lexicographically. If the marker does not
    -+    exist, output begins from the first reference that would come after it
    -+    alphabetically. Cannot be used with general pattern matching or custom
    -+    sort options.
    ++    references which follow the marker lexicographically. Output begins from the
    ++    first reference that would come after the marker alphabetically. Cannot be
    ++    used with general pattern matching or custom sort options.
     +
      FIELD NAMES
      -----------


base-commit: cf6f63ea6bf35173e02e18bdc6a4ba41288acff9
change-id: 20250605-306-git-for-each-ref-pagination-0ba8a29ae646

Thanks
- Karthik





[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux