Re: [PATCH v5 5/5] for-each-ref: introduce a '--start-after' option

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Junio C Hamano <gitster@xxxxxxxxx> writes:

> Karthik Nayak <karthik.188@xxxxxxxxx> writes:
>
>> The `git-for-each-ref(1)` command is used to iterate over references
>> present in a repository. In large repositories with millions of
>> references, it would be optimal to paginate this output such that we
>> can start iteration from a given reference. This would avoid having to
>> iterate over all references from the beginning each time when paginating
>> through results.
>>
>> The previous commit added 'seek' functionality to the reference
>> backends. Utilize this and expose a '--start-after' option in
>> 'git-for-each-ref(1)'. When used, the reference iteration seeks to the
>> lexicographically next reference and iterates from there onward.
>>
>> This enables efficient pagination workflows, where the calling script
>> can remember the last provided reference and use that as the starting
>> point for the next set of references:
>>     git for-each-ref --count=100
>>     git for-each-ref --count=100 --start-after=refs/heads/branch-100
>>     git for-each-ref --count=100 --start-after=refs/heads/branch-200
>>
>> Since the reference iterators only allow seeking to a specified marker
>> via the `ref_iterator_seek()`, we introduce a helper function
>> `start_ref_iterator_after()`, which seeks to next reference by simply
>> adding (char) 1 to the marker.
>>
>> We must note that pagination always continues from the provided marker,
>> as such any concurrent reference updates lexicographically behind the
>> marker will not be output. Document the same.
>>
>> Signed-off-by: Karthik Nayak <karthik.188@xxxxxxxxx>
>> ---
>>  Documentation/git-for-each-ref.adoc |  10 +-
>>  builtin/for-each-ref.c              |   8 ++
>>  ref-filter.c                        |  78 +++++++++++----
>>  ref-filter.h                        |   1 +
>>  t/t6302-for-each-ref-filter.sh      | 194 ++++++++++++++++++++++++++++++++++++
>>  5 files changed, 272 insertions(+), 19 deletions(-)
>>
>> diff --git a/Documentation/git-for-each-ref.adoc b/Documentation/git-for-each-ref.adoc
>> index 5ef89fc0fe..ae61ba642a 100644
>> --- a/Documentation/git-for-each-ref.adoc
>> +++ b/Documentation/git-for-each-ref.adoc
>> @@ -14,7 +14,7 @@ SYNOPSIS
>>  		   [--points-at=<object>]
>>  		   [--merged[=<object>]] [--no-merged[=<object>]]
>>  		   [--contains[=<object>]] [--no-contains[=<object>]]
>> -		   [--exclude=<pattern> ...]
>> +		   [--exclude=<pattern> ...] [--start-after=<marker>]
>
> Not a problem this patch introduces, but as I noticed it, let me
> leave a #leftoverbits comment here (it is OK to have a preliminary
> clean-up patch).
>
>  * "--exclude=<pattern>" should be enclosed inside a pair of
>    (parentheses), just like the way how [(--sort=<key>)...] is
>    shown.
>
>  * [--stdin | <pattern>...] should be moved to the end.  There is no
>    reason to require "--stdin" to be the end of dashed options, but
>    the <pattern>... must be, as they are positional, not dashed.
>

I'm sending in a series of these small fixes, so I'll add this in.

>> +--start-after=<marker>::
>> +    Allows paginating the output by skipping references up to and including the
>> +    specified marker. When paging, it should be noted that references may be
>> +    deleted, modified or added between invocations. Output will only yield those
>> +    references which follow the marker lexicographically. Output begins from the
>> +    first reference that would come after the marker alphabetically. Cannot be
>> +    used with general pattern matching or custom sort options.
>
> It is unclear what "general" in "general pattern matching" refers
> to.
>
>     Cannot be used with `--sort=<key>` or `--stdin` options, or
>     the _<pattern>_ argument(s) to limit the refs.
>

This does read much better. I'll also add this.

> or something, perhaps?  It is curious how `--exclude=<pattern>`
> interacts with the feature.  Presumably the exclusion is done so
> late in the output phase that it does not have any effect?  It does
> not have to be mentioned in this documentation if that is the case
> as it is a mere implementation detail.

That is correct indeed, while this doesn't have to be documented, I
think we can merit from a test. So I'll add that in.

>
>     Side note.  The limitation that sorting and name_patterns cannot
>     be used with the feature also comes from implementation
>     (i.e. the name_patterns optimization will compete with this
>     feature to take advantage of the "prefix" thing in an
>     incompatible way), so while the reason does not have to be
>     stated in the end-user facing documentation, the effect needs
>     documenting.
>
>> @@ -3189,6 +3221,7 @@ void filter_is_base(struct repository *r,
>>
>>  static int do_filter_refs(struct ref_filter *filter, unsigned int type, each_ref_fn fn, void *cb_data)
>>  {
>> +	const char *prefix = NULL;
>> ...
>> +
>> +	if (prefix) {
>> +		struct ref_iterator *iter;
>> +
>> +		iter = refs_ref_iterator_begin(get_main_ref_store(the_repository),
>> +					       "", NULL, 0, 0);
>> +
>> +		if (filter->start_after)
>
> The start_after of the filter comes from "--start-after=<mark>".
> Can it be true with non-NULL prefix at this point?  Unless you add
> support for the option to "git branch/tag", it would not happen, I
> guess.
>
> More importantly, when you do add support to "git branch/tag", the
> code need to be updated to keep the original prefix while seeking
> the cursor to the specified <mark>, instead of clearing it.
>

Exactly, if we do add '<pattern>' and '--start-after' compatibility,
we'll have to make that change.

>> +			ret = start_ref_iterator_after(iter, filter->start_after);
>> +		else if (prefix)
>> +			ret = ref_iterator_seek(iter, prefix, 1);
>
> We have "REF_ITERATOR_SEEK_SET_PREFIX" for that "1"?

Yup, also the 'if (prefix)' can be dropped too.

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux