Re: [PATCH v2 00/11] do not overuse strbuf_split*()

Jeff King <peff@xxxxxxxx> · Sat, 2 Aug 2025 05:08:25 -0400

On Thu, Jul 31, 2025 at 03:54:22PM -0700, Junio C Hamano wrote:

> strbuf is a very good data structure to work with string data
> without having to worry about running past the end of the string.
> 
> But an array of strbuf is often a wrong data structure.  You rarely
> have need to be able to edit multiple strings represented by such an
> array simultaneously.  And strbuf_split*() that produces result in
> such a shape is a misdesigned API function.

Yeah, I agree it's a bad API, and I'd be happy if we could eventually
get rid of it.

Conversion to string_list mostly makes sense (though I especially like
the case in sub-process.c where we can just parse with skip_prefix()).

I looked over the patches and didn't see anything objectionable. But I
tried to think of subtle incompatibilities we might run into:

  1. I saw the different handling of "max" you had to deal with in one
     patch. Yuck. It might be worth tweaking string_list_split() to
     count "pieces" (which is how every other split function I've seen,
     like the one in perl, works). But that can be done later (and is
     tricky to do safely, since we wouldn't be changing the function
     signature).

  2. Is the handling of repeated delimiters the same? E.g., if I split
     on "/" and we see "foo//bar", do both split implementations yield
     the same output. I could see either of ("foo", "", "bar") and
     ("foo", "bar") being produced.

And...

> The most common use case of strbuf_split*() family of functions
> seems to be to trim away the whitespaces around each piece of split
> string.  With modern string_list_split*(), it is often no longer
> necessary.

Some of these sites do strbuf_rtrim() on the split pieces. But
STRING_LIST_SPLIT_TRIM does both left and right. Is this OK? I think so
because in both cases we are already splitting on space, so we wouldn't
expect left-hand spaces (of course you could have a stray tab or
something, but I suspect the new code matches the intent more closely).

-Peff