Re: [PATCH] diff: ensure consistent diff behavior with -I<regex> across output formats

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Aug 2, 2025 at 06:22PM, Jeff King <peff@xxxxxxxx> wrote:
> 
> So here's a naive application of the same technique:
> 
> diff --git a/diff.c b/diff.c
> index 76291e238c..0fe6eb7443 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -6845,8 +6845,28 @@ void diff_flush(struct diff_options *options)
>     DIFF_FORMAT_CHECKDIFF)) {
> for (i = 0; i < q->nr; i++) {
> struct diff_filepair *p = q->queue[i];
> - if (check_pair_status(p))
> - flush_one_pair(p, options);
> +
> + if (!check_pair_status(p))
> + continue;
> +
> + if (options->flags.diff_from_contents) {
> + FILE *orig_out = options->file;
> + int orig_changes = options->found_changes;
> + int skip;
> +
> + options->file = xfopen("/dev/null", "w");
> + diff_flush_patch(p, options);
> + skip = !options->found_changes;
> +
> + fclose(options->file);
> + options->file = orig_out;
> + options->found_changes = orig_changes;
> +
> + if (skip)
> + continue;
> + }
> +
> + flush_one_pair(p, options);
> }
> separator++;
> }
> 
> which works on a trivial example. It affects all of raw, name-only,
> name-status, and checkdiff. I know Junio said that --raw should not be
> affected, but I'm not sure I agree. Anyway, it should be possible to
> split the logic by output type.

I think I could do the same thing in diffcore_ignore(). Like:

+void diffcore_ignore(struct diff_options *o)
+{
+	struct diff_queue_struct *q = &diff_queued_diff;
+	struct diff_queue_struct outq = DIFF_QUEUE_INIT;
+
+	if (!(o->output_format &
+	    (DIFF_FORMAT_NAME |
+	     DIFF_FORMAT_NAME_STATUS)))
+		return;
+
+	for (int i = 0; i < q->nr; i++) {
+		struct diff_filepair *p = q->queue[i];
+		if (ignore_match(p, o))
+			diff_free_filepair(p);
+		else
+			diff_q(&outq, p);
+	}
+
+	free(q->queue);
+	*q = outq;
+}

And ignore_match(p, o) will run xdl_diff() for file pair p. This approach
ensures that the behavior of `git diff --raw` and `git diff --check` remains
unaffected.

> I'm not sure if stuff like --stat would need something similar. It's
> already doing a content comparison, so presumably it handles it
> internally. Maybe stuff like --dirstat would need it, too? In which case
> we'd maybe want to annotate each filepair in an initial loop with
> whether it's modified at the content-level, and then take that into
> account in various code paths.
> 
> And of course it's horribly hacky looking. Some refactoring might help.
> Certainly it is silly to open /dev/null each time through the loop.
> There might also be a better way of checking whether the diff found
> anything than the found_changes flag.
> 
> So this is really just sketching out the direction, and somebody would
> need to figure out the details.

Seems like compute_diffstat() will run xdl_diff() to fill diffstat. Since diff_flush()
calls compute_diffstat() for --stat, --dirstat=lines, --shortstat and --namestat, we
shouldn’t run an extra xdl_diff() for them. I don’t know if --dirstat=files would
need an extra xdl_diff() though.

>> * Also, should we internally run diff twice, especially even when
>>   we are going to show the patch output and are not limited to
>>   FORMAT_NAME and FORMAT_NAME_STATUS?  Generally, running the real
>>   diff in any of the diffcore transformatin is a sign of trouble.
> 
> The patch above also runs the diff twice for "-Ifoo --name-only -p". But
> I think we are kind of stuck there. We want to show all name-only
> entries before any content diffs. So either we have to run the content
> diff twice, or we have to buffer it to show after we decide whether to
> show name-only entries.

I haven’t thought of a good way to avoid running the diff twice either.
Caching the diff content seems quite complicated. Moreover, `git diff -G<regex> -p`
requires running the diff twice: the first diff is used to filter out file pairs
that don’t match, and the second diff outputs the patch.

On Tue, Jul 29, 2025 at 05:28:00PM -0700, Junio C Hamano wrote:
> 
> The enthusiasm is appreciated, but the implementation raises two
> questions.
> 
> * This special cases -I<pattern>, but any option that causes us to
>   set the .diff_from_contents flag, not just -I<pattern>, can cause
>   the raw blob comparison to be potentially different from what the
>   blob contents are compared with various "ignore this class of
>   changes" criteria.  Shouldn't "git diff -w --name-status" and the
>   like get the same treatment?

Yes, I think I could modify ignore_match() to support -w and other ignore options.

> Also, the usual way to compose a log message of this project is to
> 
>     - Give an observation on how the current system works in the
>       present tense (so no need to say "Currently X is Y", or
>       "Previously X was Y" to describe the state before your change;
>       just "X is Y" is enough), and discuss what you perceive as a
>       problem in it.
> 
>     - Propose a solution (optional---often, problem description
>       trivially leads to an obvious solution in reader's minds).
> 
>     - Give commands to somebody editing the codebase to "make it so",
>       instead of saying "This commit does X".
> 
> in this order.

Thank you once again for patiently explaining how to write proper log messages.
I’ve made notes and am ready to apply them to future commits.

Thanks,
Lidong







[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux