Junio C Hamano <gitster@xxxxxxxxx> writes: > Karthik Nayak <karthik.188@xxxxxxxxx> writes: > >> The `git-for-each-ref(1)` command is used to iterate over references >> present in a repository. In large repositories with millions of >> references, it would be optimal to paginate this output such that we >> can start iteration from a given reference. This would avoid having to >> iterate over all references from the beginning each time when paginating >> through results. >> >> The previous commit added 'seek' functionality to the reference >> backends. Utilize this and expose a '--start-after' option in >> 'git-for-each-ref(1)'. When used, the reference iteration seeks to the >> lexicographically next reference and iterates from there onward. >> >> This enables efficient pagination workflows, where the calling script >> can remember the last provided reference and use that as the starting >> point for the next set of references: >> git for-each-ref --count=100 >> git for-each-ref --count=100 --start-after=refs/heads/branch-100 >> git for-each-ref --count=100 --start-after=refs/heads/branch-200 >> >> Since the reference iterators only allow seeking to a specified marker >> via the `ref_iterator_seek()`, we introduce a helper function >> `start_ref_iterator_after()`, which seeks to next reference by simply >> adding (char) 1 to the marker. >> >> We must note that pagination always continues from the provided marker, >> as such any concurrent reference updates lexicographically behind the >> marker will not be output. Document the same. >> >> Signed-off-by: Karthik Nayak <karthik.188@xxxxxxxxx> >> --- >> Documentation/git-for-each-ref.adoc | 10 +- >> builtin/for-each-ref.c | 8 ++ >> ref-filter.c | 78 +++++++++++---- >> ref-filter.h | 1 + >> t/t6302-for-each-ref-filter.sh | 194 ++++++++++++++++++++++++++++++++++++ >> 5 files changed, 272 insertions(+), 19 deletions(-) >> >> diff --git a/Documentation/git-for-each-ref.adoc b/Documentation/git-for-each-ref.adoc >> index 5ef89fc0fe..ae61ba642a 100644 >> --- a/Documentation/git-for-each-ref.adoc >> +++ b/Documentation/git-for-each-ref.adoc >> @@ -14,7 +14,7 @@ SYNOPSIS >> [--points-at=<object>] >> [--merged[=<object>]] [--no-merged[=<object>]] >> [--contains[=<object>]] [--no-contains[=<object>]] >> - [--exclude=<pattern> ...] >> + [--exclude=<pattern> ...] [--start-after=<marker>] > > Not a problem this patch introduces, but as I noticed it, let me > leave a #leftoverbits comment here (it is OK to have a preliminary > clean-up patch). > > * "--exclude=<pattern>" should be enclosed inside a pair of > (parentheses), just like the way how [(--sort=<key>)...] is > shown. > > * [--stdin | <pattern>...] should be moved to the end. There is no > reason to require "--stdin" to be the end of dashed options, but > the <pattern>... must be, as they are positional, not dashed. > I'm sending in a series of these small fixes, so I'll add this in. >> +--start-after=<marker>:: >> + Allows paginating the output by skipping references up to and including the >> + specified marker. When paging, it should be noted that references may be >> + deleted, modified or added between invocations. Output will only yield those >> + references which follow the marker lexicographically. Output begins from the >> + first reference that would come after the marker alphabetically. Cannot be >> + used with general pattern matching or custom sort options. > > It is unclear what "general" in "general pattern matching" refers > to. > > Cannot be used with `--sort=<key>` or `--stdin` options, or > the _<pattern>_ argument(s) to limit the refs. > This does read much better. I'll also add this. > or something, perhaps? It is curious how `--exclude=<pattern>` > interacts with the feature. Presumably the exclusion is done so > late in the output phase that it does not have any effect? It does > not have to be mentioned in this documentation if that is the case > as it is a mere implementation detail. That is correct indeed, while this doesn't have to be documented, I think we can merit from a test. So I'll add that in. > > Side note. The limitation that sorting and name_patterns cannot > be used with the feature also comes from implementation > (i.e. the name_patterns optimization will compete with this > feature to take advantage of the "prefix" thing in an > incompatible way), so while the reason does not have to be > stated in the end-user facing documentation, the effect needs > documenting. > >> @@ -3189,6 +3221,7 @@ void filter_is_base(struct repository *r, >> >> static int do_filter_refs(struct ref_filter *filter, unsigned int type, each_ref_fn fn, void *cb_data) >> { >> + const char *prefix = NULL; >> ... >> + >> + if (prefix) { >> + struct ref_iterator *iter; >> + >> + iter = refs_ref_iterator_begin(get_main_ref_store(the_repository), >> + "", NULL, 0, 0); >> + >> + if (filter->start_after) > > The start_after of the filter comes from "--start-after=<mark>". > Can it be true with non-NULL prefix at this point? Unless you add > support for the option to "git branch/tag", it would not happen, I > guess. > > More importantly, when you do add support to "git branch/tag", the > code need to be updated to keep the original prefix while seeking > the cursor to the specified <mark>, instead of clearing it. > Exactly, if we do add '<pattern>' and '--start-after' compatibility, we'll have to make that change. >> + ret = start_ref_iterator_after(iter, filter->start_after); >> + else if (prefix) >> + ret = ref_iterator_seek(iter, prefix, 1); > > We have "REF_ITERATOR_SEEK_SET_PREFIX" for that "1"? Yup, also the 'if (prefix)' can be dropped too.
Attachment:
signature.asc
Description: PGP signature