Re: [PATCH v5 5/5] for-each-ref: introduce a '--start-after' option

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Karthik Nayak <karthik.188@xxxxxxxxx> writes:

> The `git-for-each-ref(1)` command is used to iterate over references
> present in a repository. In large repositories with millions of
> references, it would be optimal to paginate this output such that we
> can start iteration from a given reference. This would avoid having to
> iterate over all references from the beginning each time when paginating
> through results.
>
> The previous commit added 'seek' functionality to the reference
> backends. Utilize this and expose a '--start-after' option in
> 'git-for-each-ref(1)'. When used, the reference iteration seeks to the
> lexicographically next reference and iterates from there onward.
>
> This enables efficient pagination workflows, where the calling script
> can remember the last provided reference and use that as the starting
> point for the next set of references:
>     git for-each-ref --count=100
>     git for-each-ref --count=100 --start-after=refs/heads/branch-100
>     git for-each-ref --count=100 --start-after=refs/heads/branch-200
>
> Since the reference iterators only allow seeking to a specified marker
> via the `ref_iterator_seek()`, we introduce a helper function
> `start_ref_iterator_after()`, which seeks to next reference by simply
> adding (char) 1 to the marker.
>
> We must note that pagination always continues from the provided marker,
> as such any concurrent reference updates lexicographically behind the
> marker will not be output. Document the same.
>
> Signed-off-by: Karthik Nayak <karthik.188@xxxxxxxxx>
> ---
>  Documentation/git-for-each-ref.adoc |  10 +-
>  builtin/for-each-ref.c              |   8 ++
>  ref-filter.c                        |  78 +++++++++++----
>  ref-filter.h                        |   1 +
>  t/t6302-for-each-ref-filter.sh      | 194 ++++++++++++++++++++++++++++++++++++
>  5 files changed, 272 insertions(+), 19 deletions(-)
>
> diff --git a/Documentation/git-for-each-ref.adoc b/Documentation/git-for-each-ref.adoc
> index 5ef89fc0fe..ae61ba642a 100644
> --- a/Documentation/git-for-each-ref.adoc
> +++ b/Documentation/git-for-each-ref.adoc
> @@ -14,7 +14,7 @@ SYNOPSIS
>  		   [--points-at=<object>]
>  		   [--merged[=<object>]] [--no-merged[=<object>]]
>  		   [--contains[=<object>]] [--no-contains[=<object>]]
> -		   [--exclude=<pattern> ...]
> +		   [--exclude=<pattern> ...] [--start-after=<marker>]

Not a problem this patch introduces, but as I noticed it, let me
leave a #leftoverbits comment here (it is OK to have a preliminary
clean-up patch).

 * "--exclude=<pattern>" should be enclosed inside a pair of
   (parentheses), just like the way how [(--sort=<key>)...] is
   shown.

 * [--stdin | <pattern>...] should be moved to the end.  There is no
   reason to require "--stdin" to be the end of dashed options, but
   the <pattern>... must be, as they are positional, not dashed.

> +--start-after=<marker>::
> +    Allows paginating the output by skipping references up to and including the
> +    specified marker. When paging, it should be noted that references may be
> +    deleted, modified or added between invocations. Output will only yield those
> +    references which follow the marker lexicographically. Output begins from the
> +    first reference that would come after the marker alphabetically. Cannot be
> +    used with general pattern matching or custom sort options.

It is unclear what "general" in "general pattern matching" refers
to.

    Cannot be used with `--sort=<key>` or `--stdin` options, or
    the _<pattern>_ argument(s) to limit the refs.

or something, perhaps?  It is curious how `--exclude=<pattern>`
interacts with the feature.  Presumably the exclusion is done so
late in the output phase that it does not have any effect?  It does
not have to be mentioned in this documentation if that is the case
as it is a mere implementation detail.  

    Side note.  The limitation that sorting and name_patterns cannot
    be used with the feature also comes from implementation
    (i.e. the name_patterns optimization will compete with this
    feature to take advantage of the "prefix" thing in an
    incompatible way), so while the reason does not have to be
    stated in the end-user facing documentation, the effect needs
    documenting.

> @@ -3189,6 +3221,7 @@ void filter_is_base(struct repository *r,
>  
>  static int do_filter_refs(struct ref_filter *filter, unsigned int type, each_ref_fn fn, void *cb_data)
>  {
> +	const char *prefix = NULL;
> ...
> +
> +	if (prefix) {
> +		struct ref_iterator *iter;
> +
> +		iter = refs_ref_iterator_begin(get_main_ref_store(the_repository),
> +					       "", NULL, 0, 0);
> +
> +		if (filter->start_after)

The start_after of the filter comes from "--start-after=<mark>".
Can it be true with non-NULL prefix at this point?  Unless you add
support for the option to "git branch/tag", it would not happen, I
guess.

More importantly, when you do add support to "git branch/tag", the
code need to be updated to keep the original prefix while seeking
the cursor to the specified <mark>, instead of clearing it.

> +			ret = start_ref_iterator_after(iter, filter->start_after);
> +		else if (prefix)
> +			ret = ref_iterator_seek(iter, prefix, 1);

We have "REF_ITERATOR_SEEK_SET_PREFIX" for that "1"?




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux