On Sun, Jul 27, 2025 at 1:52 PM Lucas Seiki Oshiro <lucasseikioshiro@xxxxxxxxx> wrote: > This commit is part of the series that introduces the new subcommand > git-repo-info. > > The flag `--show-ref-format` from git-rev-parse is used for retrieving > the reference format (i.e. `files` or `reftable`). This way, it is > used for querying repository metadata, fitting in the purpose of > git-repo-info. > > Add a new field `references.format` to the repo-info subcommand > containing that information. > > Signed-off-by: Lucas Seiki Oshiro <lucasseikioshiro@xxxxxxxxx> > --- > diff --git a/Documentation/git-repo.adoc b/Documentation/git-repo.adoc > @@ -29,6 +29,10 @@ INFO KEYS > The set of data that `git repo` can return is grouped into the following > categories: > > +`references`:: > +Reference-related data: > +* `format`: the reference storage format Based upon the implementation, I can see that the user must type the key in "dotted" form: git repo info references.format but I wonder if the above documentation actually conveys this requirement. I don't think I would figure it out easily. Perhaps hand-holding the user by giving an example would help. > diff --git a/builtin/repo.c b/builtin/repo.c > +/* repo_info_fields keys should be in lexicographical order */ > +static const struct field repo_info_fields[] = { > + { "references.format", get_references_format }, > +}; How can we ensure that the lexicographical-order requirement won't break? If someone adds a new entry which is not in its proper place, presumably that will be noticed because some existing test in a test script will stop working, but it feels unnecessarily fragile and a bit of a maintenance burden. Also, this requirement does feel like a premature optimization. Do you expect this list to become so huge and the corresponding lookup function to be called so frequently that a simple brute-force linear search would be too slow? > +static int qsort_strcmp(const void *va, const void *vb) > { > + const char *a = *(const char **)va; > + const char *b = *(const char **)vb; > + > + return strcmp(a, b); > +} > + > +static int print_fields(int argc, const char **argv, struct repository *repo) > +{ > + const char *last = ""; > + > + QSORT(argv, argc, qsort_strcmp); I can see from the implementation that you are sorting the incoming arguments in order to detect and fold out duplicates. However, that raises a couple questions. First, is it really a good idea to do something other than what the user asked for? Second, if this is a good idea, then should the behavior be documented? I can see arguments in favor of sorting and de-duplicating, as well as in favor of producing exactly the (unordered) output the user asked for, including duplicates, so I don't have a strong opinion either way. But, if you do retain this behavior, then the sorting and deduplication behaviors should probably be documented. > + for (int i = 0; i < argc; i++) { > + get_value_fn *get_value; > + const char *key = argv[i]; > + struct strbuf value; > + > + if (!strcmp(key, last)) > + continue; > + > + strbuf_init(&value, 64); > + get_value = get_value_fn_for_key(key); > + > + if (!get_value) { > + strbuf_release(&value); > + return error(_("key '%s' not found"), key); > + } > + > + get_value(repo, &value); A couple observations: First, you don't actually use the strbuf until the call to get_value(), so the strbuf_init() call seems to be too early, with the result that you need a corresponding strbuf_release() in the error branch. If you move the strbuf_init() so it occurs immediately before get_value(), then you can simplify the early exit case. Second, this seems to be getting unnecessarily intimate with strbuf. I can guess that you're doing this late strbuf_init() to avoid an allocation in the case when a duplicate key was encountered, however, STRBUF_INIT doesn't actually perform an allocation, so it would be clearer to just initialize the strbuf at the time you declare it rather than calling strbuf_init() manually. However... Although it is a micro optimization (as well as a premature-optimization), it is far more common in this code base to hoist the strbuf outside of the loop and instead call strbuf_reset() it upon each iteration: struct strbuf value = STRBUF_INIT; for (...) { strbuf_reset(&value); ... if (error_condition) { strbuf_release(...); return error(...); } ... } strbuf_release(...); > + printf("%s=%s\n", key, value.buf); > + last = key; > + strbuf_release(&value); > + } > + > return 0; > } Looking at this from a higher level, is it presenting a good user-experience by potentially printing some output but then erroring out upon the first unrecognized key? Would the user-experience be improved by instead continuing the loop even after reporting an error, and then adjusting the final `return 0` to conditionally return success or error depending upon whether any keys were unrecognized? > diff --git a/t/t1900-repo.sh b/t/t1900-repo.sh > @@ -0,0 +1,57 @@ > +#!/bin/sh > + > +test_description='test git repo-info' > + > +. ./test-lib.sh > + > +# Test if a field is correctly returned in the null-terminated format This is talking about null-terminated format, but the implementation doesn't seem to emit NUL-terminated output at all. > +# Usage: test_repo_info <label> <init command> <key> <expected value> > +# > +# Arguments: > +# label: the label of the test > +# init command: a command that creates a repository called 'repo', configured > +# accordingly to what is being tested > +# key: the key of the field that is being tested > +# expected value: the value that the field should contain > +test_repo_info () { > + label=$1 > + init_command=$2 > + key=$3 > + expected_value=$4 > + > + test_expect_success "$label" ' > + test_when_finished "rm -rf repo" && > + eval "$init_command" && > + echo "$expected_value" >expected && > + git -C repo repo info "$key" >output && > + cut -d "=" -f 2 <output >actual && > + test_cmp expected actual > + ' > +} It seems that this could be simplified by crafting the expected output more precisely? eval ... && echo "$key=$expected_value" >expect && git -C repo repo info "$key" >actual && test_cmp expect actual By the way, we typically avoid cleaning up test detritus merely for the sake of cleaning up because doing so slows down the already too-slow test suite. Instead, cleanup is usually only performed when absolutely necessary to avoid some undesirable interaction between tests. Avoiding cleanup also makes it easier to debug failed tests since (hopefully) the cause of the failure is still present in the "trash" directory. In this case, if you call this function with a distinct repository name each time, then you don't have to remove the repository at all. Moreover, giving each repository a distinct and _meaningful_ name, rather than reusing the same name, could also be helpful when diagnosing failures. > +test_repo_info 'ref format files is retrieved correctly' ' > + git init --ref-format=files repo' 'references.format' 'files' > + > +test_repo_info 'ref format reftable is retrieved correctly' ' > + git init --ref-format=reftable repo' 'references.format' 'reftable' This is overly fragile. The `test_repo_info` function hardcodes the name "repo" but then the caller is also expected to pass in the name as part of the initialization argument (i.e. `git init ... repo`). To make this more robust, either don't hardcode it in the function, or stop expecting the caller to supply the name as part of the initialization argument. With only two callers, it's not clear at this point whether the `test_repo_info` function is providing any added value, especially since the additional abstraction increases cognitive load, but perhaps later patches in this series add more callers? > +test_expect_success 'git-repo-info aborts if an invalid key is requested' ' > + test_when_finished "rm -rf expected err" && > + echo "error: key '\'foo\'' not found" >expected && > + test_must_fail git repo info foo 2>err && > + test_cmp expected err > +' > + > +test_expect_success "only one value is returned if the same key is requested twice" ' This test title can be single-quoted rather than double-quoted. > + test_when_finished "rm -f expected_key expected_value actual_key actual_value output" && > + echo "references.format" >expected_key && > + git rev-parse --show-ref-format >expected_value && > + git repo info references.format references.format >output && > + cut -d "=" -f 1 <output >actual_key && > + cut -d "=" -f 2 <output >actual_value && > + test_cmp expected_key actual_key && > + test_cmp expected_value actual_value > +' As above, it seems that this could be simplified by crafting the expected state more precisely rather than slicing and dicing the actual output. Perhaps something like this? val=$(git rev-parse --show-ref-format) && echo "references.format=$val" >expect && git repo info references.format references.format >actual && test_cmp expect actual Aside from the above tests, based upon the implementation, I also expected to find a test checking that the command correctly outputs multiple values, but perhaps a later patch adds that since, presently, the implementation only knows "references.format" (thus with deduplication, you can't yet implement such a test).