Re: [PATCH v4] diff: ensure consistent diff behavior with ignore options

Junio C Hamano <gitster@xxxxxxxxx> · Thu, 07 Aug 2025 14:27:27 -0700

Lidong Yan <yldhome2d2@xxxxxxxxx> writes:

> In git-diff, options like `-w` and `-I<regex>` require comparing
> file contents to determine whether two files are the same, even when
> their SHA values differ.

Let's see if we can do something to clarify "the same" here.
Perhaps

	... two files are considered equivalent under the specified
	"ignore" rules, even when they are not bit-for-bit identical.

> For options like `--raw`, `--name-status`,
> and `--name-only`, git-diff deliberately compares only the SHA values
> to determine whether two files are the same, for performance reasons.
> As a result, a file shown in `git diff --name-status` may not appear
> in `git diff --patch`.
>
> To quickly determine whether two files are identical, Add helper

Following the above, perhaps replace "identical" with "equivalent".

Also, ", Add helper" should be ", add a helper", as that comma is
not finishing a sentence, hence the word that follows it is not at
the beginning of the next sentence.

> function diff_flush_patch_quiet() in diff.c. Add `.diff_optimize`
> field in `struct diff_options`. When `.diff_optimize` is set to
> `DIFF_OPT_DRY_RUN`, builtin_diff() will return immediately upon
> detecting any change. Call diff_flush_patch_quiet() to determine
> if we should flush `--raw`, `--name-only` or `--name-status` output.

Also the implementation details like the name of the .diff_options
member and the name of the helper function have changed, and the
proposed log message should be updated to match.

> Signed-off-by: Jeff King <peff@xxxxxxxx>
> Signed-off-by: Lidong Yan <yldhome2d2@xxxxxxxxx>
> ---
>  diff.c                     | 55 ++++++++++++++++++++++++++++----------
>  diff.h                     |  2 ++
>  t/t4013-diff-various.sh    | 14 ++++++++++
>  t/t4015-diff-whitespace.sh |  2 +-
>  xdiff-interface.h          |  6 ++---
>  5 files changed, 61 insertions(+), 18 deletions(-)
>
> diff --git a/diff.c b/diff.c
> index dca87e164f..3bd432db32 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -2444,6 +2444,15 @@ static int fn_out_consume(void *priv, char *line, unsigned long len)
>  	return 0;
>  }
>  
> +static int quick_consume(void *priv, char *line UNUSED, unsigned long len UNUSED)
> +{
> +	struct emit_callback *ecbdata = priv;
> +	struct diff_options *o = ecbdata->opt;
> +
> +	o->found_changes = 1;
> +	return 1;
> +}

OK.

> @@ -3709,6 +3718,7 @@ static void builtin_diff(const char *name_a,
>  		xdemitconf_t xecfg;
>  		struct emit_callback ecbdata;
>  		const struct userdiff_funcname *pe;
> +		int dry_run = o->dry_run;

As the "dry_run" variable is used only once in this block, we
probably do not want to add it.

>  		if (must_show_header) {
>  			emit_diff_symbol(o, DIFF_SYMBOL_HEADER,
> @@ -3759,8 +3769,11 @@ static void builtin_diff(const char *name_a,
>  
>  		if (o->word_diff)
>  			init_diff_words_data(&ecbdata, o, one, two);
> -		if (xdi_diff_outf(&mf1, &mf2, NULL, fn_out_consume,
> -				  &ecbdata, &xpp, &xecfg))

Instead we can check o->dry_run here.

> +		if (dry_run)
> +			xdi_diff_outf(&mf1, &mf2, NULL, quick_consume,
> +				      &ecbdata, &xpp, &xecfg);

We may want to leave a comment to explain why we ignore the error
return from xdi_diff_outf()?  Perhaps like below?

		if (o->dry_run)
			/*
                         * Unlike the !dry_run case, we need to ignore the
			 * return value from xdi_diff_outf() here, because
			 * xdi_diff_outf() takes non-zero return from its
                         * callback function as a sign of error and returns
		         * early (which is why we return non-zer from our
			 * callback, quick_consume()).  Unfortunately,
			 * xdi_diff_outf() signals an error by returning
			 * non-zero.
                         */
			xdi_diff_outf(&mf1, &mf2, NULL, quick_consume,
				      &ecbdata, &xpp, &xecfg);

I am undecided.

> +		else if (xdi_diff_outf(&mf1, &mf2, NULL, fn_out_consume,
> +				       &ecbdata, &xpp, &xecfg))

> +/* return 1 if any change is found; otherwise, return 0 */
> +static int diff_flush_patch_quietly(struct diff_filepair *p, struct diff_options *o)
> +{
> +	int dry_run = o->dry_run;
> +	int found_changes = o->found_changes;

In this codebase, these "original value of the variable X was this, we
tentatively save that original value away, tweak the variable X to do
something, and restore the saved value to variable X" variables are often
called "saved_X".

> +	int ret;
> +
> +	o->dry_run = 1;
> +	o->found_changes = 0;
> +	diff_flush_patch(p, o);
> +	ret = o->found_changes;
> +	o->dry_run = dry_run;
> +	o->found_changes |= found_changes;
> +	return ret;
> +}

In the previous iteration, .dry_run/.diff_optimize was set and reset in
different places; doing it in a single function here makes it easier to
understand what is going on.  Nice improvement.

> diff --git a/t/t4013-diff-various.sh b/t/t4013-diff-various.sh
> index 8ebd170451..b56a79d979 100755
> --- a/t/t4013-diff-various.sh
> +++ b/t/t4013-diff-various.sh
> @@ -648,6 +648,20 @@ test_expect_success 'diff -I<regex>: detect malformed regex' '
>  	test_grep "invalid regex given to -I: " error
>  '
>  
> +test_expect_success 'diff -I<regex>: ignore matching file' '
> +	test_seq 50 >file1 &&
> +	git add file1 &&
> +	test_seq 50 | sed -e "s/13/ten and three/" -e "s/^[124-9].*/& /" >file1 &&
> +
> +	: >actual &&
> +	git diff --raw --ignore-blank-lines -I"ten.*e" -I"^[124-9]" >>actual &&
> +	git diff --name-only --ignore-blank-lines -I"ten.*e" -I"^[124-9]" >>actual &&
> +	git diff --name-status --ignore-blank-lines -I"ten.*e" -I"^[124-9]" >>actual &&
> +	! grep "file1" actual &&

Perhaps use test_grep helper shell function, i.e.

	test_grep ! "file1" actual &&

> +	git rm -f file1

Is this because later tests will break if you leave "file1" in the working
tree and/or in the index?  If so, we should use test_when_finished to make
such a clean-up.  If you insert

	test_when_finished "git rm file1; rm -f file1" &&

at the very beginning, before you create file1 with 1..50, when this test
piece finishes executing (whether it completed successfully, or failed in
the middle of the &&-chain), the specified command will run.

On the other hand, if the later tests won't mind whether "file1" does or
does not exist in the working tree and/or in the index, it is common to
leave it behind without cleaning it.  When running the test script with the
"-i" option, i.e.

	$ sh t4013-diff-various.sh -i -v

leaving the files that were used in the test, without cleaning up,
sometimes helps your debugging of the test script.

> diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
> index 52e3e476ff..e7be8c5a8f 100755
> --- a/t/t4015-diff-whitespace.sh
> +++ b/t/t4015-diff-whitespace.sh
> @@ -11,7 +11,7 @@ test_description='Test special whitespace in diff engine.
>  . "$TEST_DIRECTORY"/lib-diff.sh
>  
>  for opt_res in --patch --quiet -s --stat --shortstat --dirstat=lines \
> -	       --raw! --name-only! --name-status!
> +	       --raw --name-only --name-status
>  do

Wouldn't this make the "if the option is marked with !, tweak the test that
notices these two equivalent paths are not-identical" extra code, whose
beginning part we see below, unnecessary?  The $expect_failure variable
would always be an empty string, so "different but equivalent" test should
see "git diff --exit-code" exit with status 0, right?

Other than that, looking quite good.

Thanks.