On Sat, Aug 02, 2025 at 10:09:59AM -0700, Junio C Hamano wrote: > > 2. Is the handling of repeated delimiters the same? E.g., if I split > > on "/" and we see "foo//bar", do both split implementations yield > > the same output. I could see either of ("foo", "", "bar") and > > ("foo", "bar") being produced. > > Same between? strbuf_split*() seems to be happy to produce an empty > piece in the result (and there is no "omit empty ones" feature). > You can choose with STRING_LIST_SPLIT_NOEMPTY both results, but for > a straight conversion, the vanilla one without that flag would do > what we want, I think. Yeah, I meant same between strbuf_split and string_list_split. I didn't actually peek into the behavior of either. It was more of a "did you check this it?" question. It sounds like you did. Good. > > Some of these sites do strbuf_rtrim() on the split pieces. But > > STRING_LIST_SPLIT_TRIM does both left and right. Is this OK? I think so > > because in both cases we are already splitting on space, so we wouldn't > > expect left-hand spaces (of course you could have a stray tab or > > something, but I suspect the new code matches the intent more closely). > > I did wonder about these "rtrim-only" callsites strbuf_split(), but > I did not think they needed ltrim, since they were splitting at SP > and rejected when the element were empty (remember, strbuf_split() > happily creates an empty element in the result). > > Their use of rtrim() is primarily when they split at SP they want to > remove that SP (remember, another misdesign of strbuf_split() is > that it leaves the delimiter itself at the end of each split piece). > Both 05/11 step for merge-tree and 06/11 step for notes are good > examples. Makes sense. I did not even realize all of the horrible gotchas of strbuf_split(). :) > So in short, yes, the new code is being a bit more lenient (the > original parsers were more strict and insisted on non-trimming > behaviour). We might consider tightening them by removing the > STRING_LIST_SPLIT_TRIM bit to be more faithful to the original, but > as you may have noticed, doing so would make it more strict at the > right end. The original that split at SP and then rtrimmed allowed > trailing HT after the data to be removed, but I do not think it was > part of the designed behaviour of the original anyway. So I am > inclined to say we should probably drop STRING_LIST_SPLIT_TRIM; > nobody would scream. Hmm, yeah, I see what you mean. The documentation for merge-tree is quite vague about whether it is a single space or if multiple spaces are allowed. I'd be inclined to leave it on the lenient side, since I don't think there is any ambiguity. The git-notes one does specifically say "SP", so anybody who would be broken by losing the extra trim is already violating the docs. But again, I'd be tempted to just err on the friendly side. I doubt either case matters too much either way, though. -Peff