Re: [GSoC RFC PATCH v2 5/7] repo-info: add the field references.format

Lucas Seiki Oshiro <lucasseikioshiro@xxxxxxxxx> · Fri, 4 Jul 2025 18:11:22 -0300

> Hm, okay. I guess later patches will add separate enums for each
> category, so this saves us a couple bytes as the number of categories
> grows.

Yeah, that's the idea.

However, even though it saves some bytes (we don't need to keep several empty 
fields), I used a union here mostly because of what it means semantically. In
the print functions where I'm `switch`ing between values of one of those enums,
the compiler complains if I missed something :-).

> I wonder a bit what it buys us that we have the difference between the
> category and reference format. Right now it feels like it can cause more
> errors that it prevents, as we now always have doubly-nested switches.

I reckon that my solution was a little hacky bit, but I tried to solve two
things:

1. The plaintext/linewise/null-terminated format can return the values in the
order they were requested, like this:

```
$ git repo-info --format=plaintext layout.bare references.format layout.shallow
layout.bare=true
references.format=files
layout.shallow=false

```

2. The same is not applicable to the JSON format, where the order shouldn't
matter and we can't have the same key repeating. Then it works like this:

```
$ git repo-info --format=json layout.bare references.format layout.shallow
{
  "references": {
    "format": "files"
  },
  "layout": {
    "bare": false,
    "shallow": false
  }
}
```

but not like this:

```
$ git repo-info --format=json layout.bare references.format layout.shallow
{
  "layout": {
    "bare": false
  },
  "references": {
    "format": "files"
  },
  "layout": {
    "shallow": false
  }
}
```
.

I'm still open to changes here. Perhaps keeping the same order in 1. is not so
useful from this v2 as I'll always return the name of the keys (like, for
example, git-config but unlike git-rev-parse).

> Wouldn't it make more sense to only only pass around the fields as
> `repo_info_references_field`? We could then have two arrays that we
> define globally:
> 
>    static const char const* name_by_field[] = {
>        [FIELD_REFERENCES_FORMAT] = "references.format",
>    };
> 
>    static repo_info_category category_by_field[] = {
>        [FIELD_REFERENCES_FORMAT] = CATEGORY_REFERENCES,
>    };
> 
> So `name_by_field[FIELD_REFERENCES_FORMAT]` would yield the name and
> `category_by_field[CATEGORY_REFERENCES]` would yield its category. But
> the benefit is that you only need to pass around the field enum from now
> on, all other information is implicit.
> 
> The reverse information can also be obtained easily. To e.g. get all
> fields of a reference you'd iterate through `category_by_field` and take
> all array indices whose value matches the desired category.

A downside that I see is that it seems to be make the print_json function
too complex. Currently, the complexities of the printing functions are (if
I'm not missing something and ignoring the complexity of data the retrieval):

- plaintext: O(n_fields), as it only iterates over the fields

- json: O(n_fields + F), where F is number of possible fields that we can get,
  as we first fill the `<category>_fields` variables and then we iterate over
  each possible field to find if it was requested

In the plaintext format it wouldn't change too much, it would be still
O(n_fields). But in the json format, it seems that will be something like
(in pseudocode):

```
jw = json_writer
fields_to_print = array with F values set to false

for (field in repo_info->fields)
    fields_to_print[field] = true

for (category in all_categories)
    object_started = false

    for (field in all_fields)
        if (category_by_field[field] == category)
            if (!object_started)
                jw_object_inline_begin_object(&jw, category)
            jw_object(&jw, name_by_field[field], retrieve(field))

    if (object_started)
        jw_end(&jw)

```

which would be O(n_fields + n_categories * F). To be honest I'm not exactly
thinking about performance (which would be in fact negligible in this command),
but how complex the code would be. If I understood it correctly, I wwould be
trading nested `switch`es by nested `for`s.

But again, I'm still open to change everything again here!

Anyway, thank you for your time reviewing this patch, seeing what I did in the
past weeks and joining again the discussion! :-)