Re: [PATCH] diff: ensure consistent diff behavior with -I<regex> across output formats

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jul 29, 2025 at 05:28:00PM -0700, Junio C Hamano wrote:

> The enthusiasm is appreciated, but the implementation raises two
> questions.
> 
>  * This special cases -I<pattern>, but any option that causes us to
>    set the .diff_from_contents flag, not just -I<pattern>, can cause
>    the raw blob comparison to be potentially different from what the
>    blob contents are compared with various "ignore this class of
>    changes" criteria.  Shouldn't "git diff -w --name-status" and the
>    like get the same treatment?

We already have the diff_from_contents flag which is used for
--exit-code. We should be able to see where that logic is applied and do
something similar. It looks like it happens in diff_flush(), which makes
sense:

          if (output_format & DIFF_FORMAT_NO_OUTPUT &&
              options->flags.exit_with_status &&
              options->flags.diff_from_contents) {
                  /*
                   * run diff_flush_patch for the exit status. setting
                   * options->file to /dev/null should be safe, because we
                   * aren't supposed to produce any output anyway.
                   */
                  diff_free_file(options);
                  options->file = xfopen("/dev/null", "w");
                  options->close_file = 1;
                  options->color_moved = 0;
                  for (i = 0; i < q->nr; i++) {
                          struct diff_filepair *p = q->queue[i];
                          if (check_pair_status(p))
                                  diff_flush_patch(p, options);
                          if (options->found_changes)
                                  break;
                  }
          }

So here's a naive application of the same technique:

diff --git a/diff.c b/diff.c
index 76291e238c..0fe6eb7443 100644
--- a/diff.c
+++ b/diff.c
@@ -6845,8 +6845,28 @@ void diff_flush(struct diff_options *options)
 			     DIFF_FORMAT_CHECKDIFF)) {
 		for (i = 0; i < q->nr; i++) {
 			struct diff_filepair *p = q->queue[i];
-			if (check_pair_status(p))
-				flush_one_pair(p, options);
+
+			if (!check_pair_status(p))
+				continue;
+
+			if (options->flags.diff_from_contents) {
+				FILE *orig_out = options->file;
+				int orig_changes = options->found_changes;
+				int skip;
+
+				options->file = xfopen("/dev/null", "w");
+				diff_flush_patch(p, options);
+				skip = !options->found_changes;
+
+				fclose(options->file);
+				options->file = orig_out;
+				options->found_changes = orig_changes;
+
+				if (skip)
+					continue;
+			}
+
+			flush_one_pair(p, options);
 		}
 		separator++;
 	}

which works on a trivial example. It affects all of raw, name-only,
name-status, and checkdiff. I know Junio said that --raw should not be
affected, but I'm not sure I agree. Anyway, it should be possible to
split the logic by output type.

I'm not sure if stuff like --stat would need something similar. It's
already doing a content comparison, so presumably it handles it
internally. Maybe stuff like --dirstat would need it, too? In which case
we'd maybe want to annotate each filepair in an initial loop with
whether it's modified at the content-level, and then take that into
account in various code paths.

And of course it's horribly hacky looking. Some refactoring might help.
Certainly it is silly to open /dev/null each time through the loop.
There might also be a better way of checking whether the diff found
anything than the found_changes flag.

So this is really just sketching out the direction, and somebody would
need to figure out the details.

>  * Also, should we internally run diff twice, especially even when
>    we are going to show the patch output and are not limited to
>    FORMAT_NAME and FORMAT_NAME_STATUS?  Generally, running the real
>    diff in any of the diffcore transformatin is a sign of trouble.

The patch above also runs the diff twice for "-Ifoo --name-only -p". But
I think we are kind of stuck there. We want to show all name-only
entries before any content diffs. So either we have to run the content
diff twice, or we have to buffer it to show after we decide whether to
show name-only entries.

-Peff




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux