Re: [PATCH 2/2] test-lib: teach test_seq the -f option

Jeff King <peff@xxxxxxxx> · Tue, 24 Jun 2025 06:36:52 -0400

On Tue, Jun 24, 2025 at 02:22:21AM -0400, Eric Sunshine wrote:

> > diff --git a/t/t0612-reftable-jgit-compatibility.sh b/t/t0612-reftable-jgit-compatibility.sh
> > @@ -112,14 +112,11 @@ test_expect_success 'JGit can read multi-level index' '
> > -               awk "
> > -                   BEGIN {
> > -                       print \"start\";
> > -                       for (i = 0; i < 10000; i++)
> > -                           printf \"create refs/heads/branch-%d HEAD\n\", i;
> > -                       print \"commit\";
> > -                   }
> > -               " >input &&
> > +               {
> > +                       echo start &&
> > +                       test_seq -f "create refs/heads/branch-%d HEAD" 10000 &&
> > +                       echo commit
> > +               } >input &&
> 
> I had suggested[1] an effectively equivalent change to Patrick for a
> couple tests in the nearby t0610, but he rejected[2] the idea due to
> the pure-shell version being significantly slower than the `awk`
> version.
> 
> Pondering his response today, I wondered if it would make sense to
> replace our pure-shell `test_seq` with an implementation via `awk`,
> however, if most of our sequence vend only a small set of numbers,
> then the startup cost of `awk` would probably swamp any savings,
> especially on Windows where process startup is extremely slow. Taking
> that into account, I further wondered if we could see an overall win
> by taking a hybrid approach in which we employ the pure-shell version
> if vending a small set of numbers, but fall over to an `awk` version
> if vending a lot of numbers, especially as in the test above or the
> tests in t0610. Anyhow, food for thought, or not, if you're not hungry
> for thought food.

Ah, interesting. I didn't time it at all, as my general intuition for
shell performance is that counting process spawns overrides everything
else (though admittedly it is usually O(n) processes vs O(1), and here
we are going from one extra process to zero).

I did a few timings, and it looks like the shell wins at 10,000 on my
system, but awk wins at 50,000 (though there is a lot of run-to-run
noise; I think awk might even win at 10,000 on a loaded system, as this
is such a light load that CPU frequency throttling comes into play).

I assumed that the culprit was a lack of buffering, but I don't think
so. awk seems to issue 10,000 write() calls. I guess it is just internal
shell overhead in issuing commands. Where is a JIT byte-code shell
interpreter when we need one? ;)

My inclination is not to worry about it too much. At 10,000 I think we
are talking about a few milliseconds. There's so much more low-hanging
fruit if somebody wants to optimize the test suite. IMHO readability is
more important here (and if we really want to optimize, doing it inside
test_seq would be better).

-Peff