Re: [GSoC PATCH v7 2/5] repo: add the field references.format

Eric Sunshine <sunshine@xxxxxxxxxxxxxx> · Fri, 1 Aug 2025 16:59:15 -0400

On Fri, Aug 1, 2025 at 9:11 AM Lucas Seiki Oshiro
<lucasseikioshiro@xxxxxxxxx> wrote:
> This commit is part of the series that introduces the new subcommand
> git-repo-info.
> [...]
> Add a new field `references.format` to the repo-info subcommand
> containing that information.
>
> Signed-off-by: Lucas Seiki Oshiro <lucasseikioshiro@xxxxxxxxx>
> ---
> diff --git a/Documentation/git-repo.adoc b/Documentation/git-repo.adoc
> @@ -22,6 +22,20 @@ COMMANDS
> +In order to obtain a set of values from `git repo info`, you should provide
> +the keys that identify them. Here's a list of the available keys and the
> +values that they return:
> +
> +`references.format`::
> +The reference storage format. The valid values are:
> ++
> +include::ref-storage-format.adoc[]

In the implementation below, I see that this version of the series
passes all the printed values through quote_c_style(), which is a
welcome change, however, an equally (if not more) important change
seems to be missing. Namely, we _must_ document that values with
"funny" characters will be C-style quoted. Without such documentation,
consumers are left in the same sort of situation as they were in
without quote_c_style(); to wit, they will be surprised and their
tooling may break when they suddenly encounter a value which is
quoted.

> diff --git a/builtin/repo.c b/builtin/repo.c
> +static int print_fields(int argc, const char **argv, struct repository *repo)
> +{
> +       int ret = 0;
> +       const char *last = "";
> +       struct strbuf sb = STRBUF_INIT;
> +
> +       QSORT(argv, argc, qsort_strcmp);
> +
> +       for (int i = 0; i < argc; i++) {
> +               get_value_fn *get_value;
> +               const char *key = argv[i];
> +               char *value;
> +
> +               if (!strcmp(key, last))
> +                       continue;
> +
> +               get_value = get_value_fn_for_key(key);
> +
> +               if (!get_value) {
> +                       ret = error(_("key '%s' not found"), key);
> +                       continue;
> +               }
> +
> +               strbuf_reset(&sb);
> +               get_value(repo, &sb);
> +
> +               value = strbuf_detach(&sb, NULL);
> +               quote_c_style(value, &sb, NULL, 0);
> +               free(value);
> +
> +               printf("%s=%s\n", key, sb.buf);
> +               last = key;
> +       }
> +
> +       strbuf_release(&sb);
> +       return ret;
> +}

This logic leads to a poor user-experience if the user asks for the
same non-existent key multiple times since that case subverts the
deduplication logic. For instance:

    % git repo info non.existent references.format non.existent
    key 'non.existent' not found
    key 'non.existent' not found
    references.format=gobbledygook

You can fix this by performing the `last` assignment earlier in the
loop prior to any other `continue` statements:

    if (!strcmp(key, last))
        continue;
    last = key;

    get_value = get_value_fn_for_key(key);
    if (!get_value) {
        ret = error(_("key '%s' not found"), key);
        continue;
    }

That aside, the strbuf detach/repurpose/free dance that the code does:

    value = strbuf_detach(&sb, NULL);
    quote_c_style(value, &sb, NULL, 0);
    free(value);

is unnecessarily confusing and difficult to fathom because it is
repurposing the strubuf and increasing the number of allocations and
deallocations for no apparent reason. You can decrease the cognitive
load simply by using two strbufs, one for each distinct purpose,
perhaps like this:

    struct strbuf valbuf = STRBUF_INIT;
    struct strbuf quotbuf = STRBUF_INIT;
    ...
    for (int i = 0; i < argc; i++) {
        ...
        strbuf_reset(&valbuf);
        strbuf_reset(&quotbuf);
        get_value(repo, &valbuf);
        quote_c_style(valbuf.buf, &quotbuf, NULL, 0);
        printf("%s=%s\n", key, quotbuf.buf);
    }
    strbuf_release(&quotbuf);
    strbuf_release(&valbuf);

> diff --git a/t/t1900-repo.sh b/t/t1900-repo.sh
> @@ -0,0 +1,57 @@
> +# Test whether a key-value pair is correctly returned
> +#
> +# Usage: test_repo_info <label> <init command> <key> <expected value>
> +#
> +# Arguments:
> +#   label: the label of the test
> +#   init command: a command which creates a repository named with its first argument,
> +#      accordingly to what is being tested
> +#   key: the key of the field that is being tested
> +#   expected value: the value that the field should contain
> +test_repo_info () {
> +       label=$1
> +       init_command=$2
> +       repo_name=$3
> +       key=$4
> +       expected_value=$5

The function documentation (including "Usage") talks about four
arguments, but the function expects five.

I'm having trouble understanding what is meant by "repository named
with its first argument accordingly to what is being tested". Also:
s/accordingly/according/

> +       test_expect_success "$label" '
> +               eval "$init_command $repo_name" &&
> +               echo "$key=$expected_value" >expected &&
> +               git -C $repo_name repo info "$key" >actual &&
> +               test_cmp expected actual
> +       '
> +}
> +
> +test_expect_success 'git-repo-info outputs data even if there is an invalid field' '
> +       echo "references.format=files" >expected &&
> +       test_must_fail git repo info foo references.format bar >actual &&
> +       test_cmp expected actual
> +'
> +
> +test_expect_success 'only one value is returned if the same key is requested twice' '
> +       val=$(git rev-parse --show-ref-format) &&
> +       echo "references.format=$val" >expect &&
> +       git repo info references.format references.format >actual &&
> +       test_cmp expect actual
> +'

These tests are easier to understand and are more robust in this version. Good.