Re: [PATCH v6 3/5] promisor-remote: refactor how we parse advertised fields

Christian Couder <christian.couder@xxxxxxxxx> · Thu, 31 Jul 2025 09:22:47 +0200

On Mon, Jul 21, 2025 at 10:39 PM Junio C Hamano <gitster@xxxxxxxxx> wrote:
>
> Christian Couder <christian.couder@xxxxxxxxx> writes:
>
> > +static struct promisor_info *parse_one_advertised_remote(struct strbuf *remote_info)
> > +{
> > +     struct promisor_info *info = xcalloc(1, sizeof(*info));
> > +     struct strbuf **elems = strbuf_split(remote_info, ',');
>
> Unless the primary use of an array is about passing it around as a
> whole "set", name such a variable singular, so that element[4] can
> be naturally read as "fourth element"---"fourth elements" is not as
> natural.
>
> Also, can't we do this without strbuf_split(), which is a wrong API
> to use in general [*]?  strbuf is a very good data structure to work
> with when editing string data, but an array of strbuf is not---you
> would not be editing many pieces of string data in parallel.
>
>     [*] often string_list_split_in_place() is a better alternative,
>     especially when you do not have to heavily edit the substrings.

Yeah, string_list_split_in_place() is used in v7 and it looks better.

> > +     for (size_t i = 0; elems[i]; i++) {
> > +             char *elem = elems[i]->buf;
> > +             char *value;
> > +             char *p = strchr(elem, '=');
>
>         The pointer elem points at the name, and the pointer p
>         points at the beginning of value, which could contain '='.
>
> > +             strbuf_strip_suffix(elems[i], ",");
>
> This does not even count as "editing"; split_in_place() would have
> removed the trailing comma (and replaced it with NUL to terminate
> the string).

Yeah, right. This is fixed in v7.

> > +             if (!p) {
> > +                     warning(_("invalid element '%s' from remote info"), elem);
> > +                     continue;
> > +             }
>
>         elem pointed at "foo" or "foo,"; we may have stripped the
>         trailing comma, but we didn't see the equal sign to start
>         the value at all.  Bad input.
>
> > +             *p = '\0';
>
>         Terminate the name by replacing '=' with NUL.
>
> > +             value = url_percent_decode(p + 1);
>
>         Can this helper function fail and signal that it saw a
>         malformed data?  If not already, shouldn't it be taught to
>         do so?
>
> We are inventing the syntax for this data in this series, so if this
> helper takes garbage data silently, and if we are not willing to fix
> it, then we can even consider changing the syntax to something with
> a helper we can use that already has a good error checking.

I am not sure what the interface would look like. Should a version of
url_percent_decode() that returns NULL in case of error would be
enough, or should it use `return error("...");`in case of error?

And it seems to me that an error can only happen when the string
passed to url_percent_decode() contains a '%' which is not followed by
2 hexadecimal characters, or do you see other possible errors?

For now I haven't changed this in v7, but I am open to suggestions
about the best way to implement it in a v8 or maybe a separate series.

> > +             if (!strcmp(elem, "name"))
> > +                     info->name = value;
> > +             else if (!strcmp(elem, "url"))
> > +                     info->url = value;
> > +             else
> > +                     free(value);
>
> As url_percent_decode() always allocate a new copy of string even
> when there is nothing to decode, value will always be an allocated
> string, and if we are not storing it away, it will leak.  The copies
> we kept in info->{name,url} are ours to own.  Makes sense.
>
> > +     strbuf_list_free(elems);
>
> And because [elem..p] (name) we only peeked, we can safely release
> the whole thing.  If you used string_list_split_in_place(), you
> would only free the string_list shell without having to free the
> underlying string.

Yeah, it's better with string_list_split_in_place().

Thanks.