On Thu, Mar 30, 2023 at 3:30 PM Jeff King <peff@xxxxxxxx> wrote:
> From: Eric Sunshine <sunshine@xxxxxxxxxxxxxx>
>
> An unclosed here-doc in a test is a problem, because it silently gobbles
> up any remaining commands. Since 99a64e4b73c (tests: lint for run-away
> here-doc, 2017-03-22) we detect this by piggy-backing on the internal
> chainlint checker in test-lib.sh.
>
> However, it would be nice to detect it in chainlint.pl, for a few
> reasons:
>
> - the output from chainlint.pl is much nicer; it can show the exact
> spot of the error, rather than a vague "somewhere in this test you
> broke the &&-chain or had a bad here-doc" message.
>
> - the implementation in test-lib.sh runs for each test snippet. And
> since it requires a subshell, the extra cost is small but not zero.
> If chainlint.pl can reliably find the problem, we can optimize the
> test-lib.sh code.
>
> The chainlint.pl code never intended to find here-doc problems. But
> since it has to parse them anyway (to avoid reporting problems inside
> here-docs), most of what we need is already there. We can detect the
> problem when we fail to find the missing end-tag in swallow_heredocs().
> The extra change in scan_heredoc_tag() stores the location of the start
> of the here-doc, which lets us mark it as the source of the error in the
> output (see the new tests for examples).
>
> [jk: added commit message and tests]
Thanks for packaging this up as a proper patch.
> Signed-off-by: Eric Sunshine <sunshine@xxxxxxxxxxxxxx>
> Signed-off-by: Jeff King <peff@xxxxxxxx>
> ---
> New in this iteration (thanks again, Eric!).
>
> I changed the error-tag from "HERE" to "UNCLOSED-HEREDOC". I don't think
> there any syntactic limitations to worry about (it's just shown to the
> user).
The error-tag is plucked by the script at a couple places using a
regex such as /\?![^?]+\?!/ which won't be tripped up by the longer
and hyphenated name, so no problem there.
> And it needs to be descriptive enough not to confuse users.
> "HERE" is especially bad because my brain interprets it as "HERE there
> is an error", which makes me say "Right. What error?".
>
> So "HEREDOC" would work, but there is no reason (aside from length) not
> to err on the side of being descriptive.
I had the same thought about HERE being potentially misleading, thus
wavered between HERE and HEREDOC, but ultimately chose the shorter for
length-similarity with the existing AMP and LOOP. That reminds me
that, for some time now, I've been thinking that t/README ought to
have a section explaining the meaning of AMP and LOOP (and HERE, if we
went with that) since newcomers may not intuitively understand what
such terse complaints mean.
> diff --git a/t/chainlint.pl b/t/chainlint.pl
> @@ -80,7 +80,8 @@ sub scan_heredoc_tag {
> - push(@{$self->{heretags}}, $indented ? "\t$tag" : "$tag");
> + $$token[0] = $indented ? "\t$tag" : "$tag";
> + push(@{$self->{heretags}}, $token);
> return "<<$indented$tag";
> @@ -169,10 +170,18 @@ sub swallow_heredocs {
> - my $indent = $tag =~ s/^\t// ? '\\s*' : '';
> - $$b =~ /(?:\G|\n)$indent\Q$tag\E(?:\n|\z)/gc;
> + my $indent = $$tag[0] =~ s/^\t// ? '\\s*' : '';
> + $$b =~ /(?:\G|\n)$indent\Q$$tag[0]\E(?:\n|\z)/gc;
> + if (pos($$b) > $start) {
> + my $body = substr($$b, $start, pos($$b) - $start);
> + $self->{lineno} += () = $body =~ /\n/sg;
> + next;
> + }
> + push(@{$self->{parser}->{problems}}, ['UNCLOSED-HEREDOC', $tag]);
> + $$b =~ /(?:\G|\n).*\z/gc; # consume rest of input
> my $body = substr($$b, $start, pos($$b) - $start);
> $self->{lineno} += () = $body =~ /\n/sg;
> + last;
Oddly enough, these changes look fine to me.