When tests are executed via `test_expect_failure` we rather obviously expect the test itself to fail. If it unexpectedly does not fail then we count the test as a "fixed" test and announce that a known breakage has vanished: ok 1 - setup ok 2 - create refs/heads/main # TODO known breakage vanished ok 3 - create refs/heads/main with oldvalue verification ... ok 299 - update-ref should also create reflog for HEAD # 1 known breakage(s) vanished; please update test(s) # passed all remaining 298 test(s) 1..299 While we announce that tests should be updated, the overall test suite still passes. This makes it quite hard to detect when a test that has previously failed succeeds now as the developer needs to pay close attention to the exact output. Even more importantly, tests that only succeed on _some_ systems are even easier to miss now, as one would have to explicitly take a look at respective CI jobs to notice that those do pass now. Furthermore, we are about to introduce support for parsing TAP output in Meson. In contrast to prove(1), which treats unexpected passes as a successful test run, Meson treats those as failure. Neither of these tools is wrong in doing so. Quoting the TAP specification [1]: Should a todo test point begin succeeding, the harness may report it in some way that indicates that whatever was supposed to be done has been, and it should be promoted to a normal Test Point. So it is essentially implementation-defined how exactly the unexpected pass is reported, and whether it should cause the overall test suite to fail or not. It is unarguably a bad thing for us though if these tools interpret these differently, as it would mean that test results now depend on whether the developer uses prove(1) or Meson. Unify the behaviour by causing a test suite to fail when there are any unexpected passes. As prove(1) does not consider an unexpected pass to be an error this leads to somewhat funky output: t1400-update-ref.sh ................................ Dubious, test returned 1 (wstat 256, 0x100) All 299 subtests passed (1 TODO test unexpectedly succeeded) ... Test Summary Report ------------------- t1400-update-ref.sh (Wstat: 256 (exited 1) Tests: 299 Failed: 0) TODO passed: 2 Non-zero exit status: 1 But as we directly announce that the root cause is an unexpected TODO that has succeeded it's not all that bad. [1]: https://testanything.org/tap-version-14-specification.html Signed-off-by: Patrick Steinhardt <ps@xxxxxx> --- t/t0000-basic.sh | 4 ++-- t/test-lib.sh | 9 ++++++++- 2 files changed, 10 insertions(+), 3 deletions(-) diff --git a/t/t0000-basic.sh b/t/t0000-basic.sh index 16b785f3b91..2b63e1c86ca 100755 --- a/t/t0000-basic.sh +++ b/t/t0000-basic.sh @@ -130,7 +130,7 @@ test_expect_success 'subtest: a failing TODO test' ' ' test_expect_success 'subtest: a passing TODO test' ' - write_and_run_sub_test_lib_test passing-todo <<-\EOF && + write_and_run_sub_test_lib_test_err passing-todo <<-\EOF && test_expect_failure "pretend we have fixed a known breakage" "true" test_done EOF @@ -142,7 +142,7 @@ test_expect_success 'subtest: a passing TODO test' ' ' test_expect_success 'subtest: 2 TODO tests, one passin' ' - write_and_run_sub_test_lib_test partially-passing-todos <<-\EOF && + write_and_run_sub_test_lib_test_err partially-passing-todos <<-\EOF && test_expect_failure "pretend we have a known breakage" "false" test_expect_success "pretend we have a passing test" "true" test_expect_failure "pretend we have fixed another known breakage" "true" diff --git a/t/test-lib.sh b/t/test-lib.sh index 0a124ffad38..5352209d3e4 100644 --- a/t/test-lib.sh +++ b/t/test-lib.sh @@ -1272,7 +1272,14 @@ test_done () { check_test_results_san_file_ "$test_failure" - if test -z "$skip_all" && test -n "$invert_exit_code" + if test "$test_fixed" != 0 + then + if test -z "$invert_exit_code" + then + GIT_EXIT_OK=t + exit 1 + fi + elif test -z "$skip_all" && test -n "$invert_exit_code" then say_color warn "# faking up non-zero exit with --invert-exit-code" GIT_EXIT_OK=t -- 2.50.0.rc0.604.gd4ff7b7c86.dirty