Lidong Yan <yldhome2d2@xxxxxxxxx> writes: > In git-diff, options like `-w` and `-I<regex>` require comparing > file contents to determine whether two files are the same, even when > their SHA values differ. Let's see if we can do something to clarify "the same" here. Perhaps ... two files are considered equivalent under the specified "ignore" rules, even when they are not bit-for-bit identical. > For options like `--raw`, `--name-status`, > and `--name-only`, git-diff deliberately compares only the SHA values > to determine whether two files are the same, for performance reasons. > As a result, a file shown in `git diff --name-status` may not appear > in `git diff --patch`. > > To quickly determine whether two files are identical, Add helper Following the above, perhaps replace "identical" with "equivalent". Also, ", Add helper" should be ", add a helper", as that comma is not finishing a sentence, hence the word that follows it is not at the beginning of the next sentence. > function diff_flush_patch_quiet() in diff.c. Add `.diff_optimize` > field in `struct diff_options`. When `.diff_optimize` is set to > `DIFF_OPT_DRY_RUN`, builtin_diff() will return immediately upon > detecting any change. Call diff_flush_patch_quiet() to determine > if we should flush `--raw`, `--name-only` or `--name-status` output. Also the implementation details like the name of the .diff_options member and the name of the helper function have changed, and the proposed log message should be updated to match. > Signed-off-by: Jeff King <peff@xxxxxxxx> > Signed-off-by: Lidong Yan <yldhome2d2@xxxxxxxxx> > --- > diff.c | 55 ++++++++++++++++++++++++++++---------- > diff.h | 2 ++ > t/t4013-diff-various.sh | 14 ++++++++++ > t/t4015-diff-whitespace.sh | 2 +- > xdiff-interface.h | 6 ++--- > 5 files changed, 61 insertions(+), 18 deletions(-) > > diff --git a/diff.c b/diff.c > index dca87e164f..3bd432db32 100644 > --- a/diff.c > +++ b/diff.c > @@ -2444,6 +2444,15 @@ static int fn_out_consume(void *priv, char *line, unsigned long len) > return 0; > } > > +static int quick_consume(void *priv, char *line UNUSED, unsigned long len UNUSED) > +{ > + struct emit_callback *ecbdata = priv; > + struct diff_options *o = ecbdata->opt; > + > + o->found_changes = 1; > + return 1; > +} OK. > @@ -3709,6 +3718,7 @@ static void builtin_diff(const char *name_a, > xdemitconf_t xecfg; > struct emit_callback ecbdata; > const struct userdiff_funcname *pe; > + int dry_run = o->dry_run; As the "dry_run" variable is used only once in this block, we probably do not want to add it. > if (must_show_header) { > emit_diff_symbol(o, DIFF_SYMBOL_HEADER, > @@ -3759,8 +3769,11 @@ static void builtin_diff(const char *name_a, > > if (o->word_diff) > init_diff_words_data(&ecbdata, o, one, two); > - if (xdi_diff_outf(&mf1, &mf2, NULL, fn_out_consume, > - &ecbdata, &xpp, &xecfg)) Instead we can check o->dry_run here. > + if (dry_run) > + xdi_diff_outf(&mf1, &mf2, NULL, quick_consume, > + &ecbdata, &xpp, &xecfg); We may want to leave a comment to explain why we ignore the error return from xdi_diff_outf()? Perhaps like below? if (o->dry_run) /* * Unlike the !dry_run case, we need to ignore the * return value from xdi_diff_outf() here, because * xdi_diff_outf() takes non-zero return from its * callback function as a sign of error and returns * early (which is why we return non-zer from our * callback, quick_consume()). Unfortunately, * xdi_diff_outf() signals an error by returning * non-zero. */ xdi_diff_outf(&mf1, &mf2, NULL, quick_consume, &ecbdata, &xpp, &xecfg); I am undecided. > + else if (xdi_diff_outf(&mf1, &mf2, NULL, fn_out_consume, > + &ecbdata, &xpp, &xecfg)) > +/* return 1 if any change is found; otherwise, return 0 */ > +static int diff_flush_patch_quietly(struct diff_filepair *p, struct diff_options *o) > +{ > + int dry_run = o->dry_run; > + int found_changes = o->found_changes; In this codebase, these "original value of the variable X was this, we tentatively save that original value away, tweak the variable X to do something, and restore the saved value to variable X" variables are often called "saved_X". > + int ret; > + > + o->dry_run = 1; > + o->found_changes = 0; > + diff_flush_patch(p, o); > + ret = o->found_changes; > + o->dry_run = dry_run; > + o->found_changes |= found_changes; > + return ret; > +} In the previous iteration, .dry_run/.diff_optimize was set and reset in different places; doing it in a single function here makes it easier to understand what is going on. Nice improvement. > diff --git a/t/t4013-diff-various.sh b/t/t4013-diff-various.sh > index 8ebd170451..b56a79d979 100755 > --- a/t/t4013-diff-various.sh > +++ b/t/t4013-diff-various.sh > @@ -648,6 +648,20 @@ test_expect_success 'diff -I<regex>: detect malformed regex' ' > test_grep "invalid regex given to -I: " error > ' > > +test_expect_success 'diff -I<regex>: ignore matching file' ' > + test_seq 50 >file1 && > + git add file1 && > + test_seq 50 | sed -e "s/13/ten and three/" -e "s/^[124-9].*/& /" >file1 && > + > + : >actual && > + git diff --raw --ignore-blank-lines -I"ten.*e" -I"^[124-9]" >>actual && > + git diff --name-only --ignore-blank-lines -I"ten.*e" -I"^[124-9]" >>actual && > + git diff --name-status --ignore-blank-lines -I"ten.*e" -I"^[124-9]" >>actual && > + ! grep "file1" actual && Perhaps use test_grep helper shell function, i.e. test_grep ! "file1" actual && > + git rm -f file1 Is this because later tests will break if you leave "file1" in the working tree and/or in the index? If so, we should use test_when_finished to make such a clean-up. If you insert test_when_finished "git rm file1; rm -f file1" && at the very beginning, before you create file1 with 1..50, when this test piece finishes executing (whether it completed successfully, or failed in the middle of the &&-chain), the specified command will run. On the other hand, if the later tests won't mind whether "file1" does or does not exist in the working tree and/or in the index, it is common to leave it behind without cleaning it. When running the test script with the "-i" option, i.e. $ sh t4013-diff-various.sh -i -v leaving the files that were used in the test, without cleaning up, sometimes helps your debugging of the test script. > diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh > index 52e3e476ff..e7be8c5a8f 100755 > --- a/t/t4015-diff-whitespace.sh > +++ b/t/t4015-diff-whitespace.sh > @@ -11,7 +11,7 @@ test_description='Test special whitespace in diff engine. > . "$TEST_DIRECTORY"/lib-diff.sh > > for opt_res in --patch --quiet -s --stat --shortstat --dirstat=lines \ > - --raw! --name-only! --name-status! > + --raw --name-only --name-status > do Wouldn't this make the "if the option is marked with !, tweak the test that notices these two equivalent paths are not-identical" extra code, whose beginning part we see below, unnecessary? The $expect_failure variable would always be an empty string, so "different but equivalent" test should see "git diff --exit-code" exit with status 0, right? Other than that, looking quite good. Thanks.