Re: [GSoC PATCH v5 2/5] repo: add the field references.format

Lucas Seiki Oshiro <lucasseikioshiro@xxxxxxxxx> · Thu, 31 Jul 2025 16:39:57 -0300

> Based upon the implementation, I can see that the user must type the
> key in "dotted" form:
> 
>    git repo info references.format

Agreed, that's an important information that was missing in this
documentation.

> I don't think I would figure it out easily. Perhaps hand-holding the
> user by giving an example would help.

Looks like a good idea. I'll add it in the 4th path of this patchset,
so we'll have an example with more than one field.

> How can we ensure that the lexicographical-order requirement won't
> break?

Good point. We don't ensure it through tests. I plan to add an --all
flag to retrieve all the fields. With that --all flag I can iterate
and check whether the keys are in the correct order.

> Also, this requirement does feel like a premature optimization. Do
> you expect this list to become so huge and the corresponding lookup
> function to be called so frequently that a simple brute-force linear
> search would be too slow?

It won't bebig. My plans for this GSoC is to add the object format
and 9 path-related values, but of course, someone may add more stuff
to this command in the future.

About algorithm complexity, it isn't something that I'm really worried
about, but I also don't want to leave some nested loops with strcmps.
If I'm not mistaken, this is the complexity of the operations here:

- Sorting the requested keys: O(n*s*log(n))
- Searching the keys: O(s*log(m))
- Searching all the requested keys: O(n*s*log(m))
- The current solution: O(n*s*(log(m) + log(n))
- The complexity of brute-forcing would be O(n*m*s)

where:

- n is the number of the requested fields
- m is the number of available fields
- s is the length of the largest requested key

which I don't expect to be too big.

Other thing that I should point here is that I also have plans to
add a feature for requesting the name of a group of keys and then
return all its internal values. For example:

  $ git repo info layout
  layout.bare=true
  layout.shallow=false

Having everything sorted will make this easier.

> I can see from the implementation that you are sorting the incoming
> arguments in order to detect and fold out duplicates.

Yes, that's the main idea. In the previous versions (where we also
had a JSON version), this was done in a more hacky way. Actually,
sorting the values was a suggestion to make it simpler.

> However, that raises a couple questions. First, is it really a good
> idea to do something other than what the user asked for?

In this case, the user isn't asking too much, so we're free here. For
example, in git-rev-parse the data is returned in the correct order.

> Second, if this is a good idea, then should the behavior be documented?

Of course, I'll do that!

>    struct strbuf value = STRBUF_INIT;
>    for (...) {
>        strbuf_reset(&value);
>        ...
>        if (error_condition) {
>            strbuf_release(...);
>            return error(...);
>        }
>       ...
>    }
>    strbuf_release(...);

Much better, thanks!

> Would the user-experience be
> improved by instead continuing the loop even after reporting an error,
> and then adjusting the final `return 0` to conditionally return
> success or error depending upon whether any keys were unrecognized?

It seems ok to me, since we're printing some values even if there is an
invalid key.

> This is talking about null-terminated format, but the implementation
> doesn't seem to emit NUL-terminated output at all.

Oops. I forgot to change it when rebasing...

> In this case, if you call this function with a distinct repository
> name each time, then you don't have to remove the repository at all.
> Moreover, giving each repository a distinct and _meaningful_ name,
> rather than reusing the same name, could also be helpful when
> diagnosing failures.

Nice solution! I'll do that.

> With only two callers, it's not clear at this point whether the
> `test_repo_info` function is providing any added value, especially
> since the additional abstraction increases cognitive load, but perhaps
> later patches in this series add more callers?

Yes. In the next patches of this patchset I'm adding other values (and
there are others that will be added in future patchesets). The tests will
look very similar, only changing the repository creation, the key and
the expected value. Then this will decrease the repetition (and
copy-paste typos).

In the last patch of this series I also add the null-terminated format.
Having two formats doubles the number of tests, and this function will
avoid even more code repetition.