On Sun, Aug 2, 2025 at 06:22PM, Jeff King <peff@xxxxxxxx> wrote: > > So here's a naive application of the same technique: > > diff --git a/diff.c b/diff.c > index 76291e238c..0fe6eb7443 100644 > --- a/diff.c > +++ b/diff.c > @@ -6845,8 +6845,28 @@ void diff_flush(struct diff_options *options) > DIFF_FORMAT_CHECKDIFF)) { > for (i = 0; i < q->nr; i++) { > struct diff_filepair *p = q->queue[i]; > - if (check_pair_status(p)) > - flush_one_pair(p, options); > + > + if (!check_pair_status(p)) > + continue; > + > + if (options->flags.diff_from_contents) { > + FILE *orig_out = options->file; > + int orig_changes = options->found_changes; > + int skip; > + > + options->file = xfopen("/dev/null", "w"); > + diff_flush_patch(p, options); > + skip = !options->found_changes; > + > + fclose(options->file); > + options->file = orig_out; > + options->found_changes = orig_changes; > + > + if (skip) > + continue; > + } > + > + flush_one_pair(p, options); > } > separator++; > } > > which works on a trivial example. It affects all of raw, name-only, > name-status, and checkdiff. I know Junio said that --raw should not be > affected, but I'm not sure I agree. Anyway, it should be possible to > split the logic by output type. I think I could do the same thing in diffcore_ignore(). Like: +void diffcore_ignore(struct diff_options *o) +{ + struct diff_queue_struct *q = &diff_queued_diff; + struct diff_queue_struct outq = DIFF_QUEUE_INIT; + + if (!(o->output_format & + (DIFF_FORMAT_NAME | + DIFF_FORMAT_NAME_STATUS))) + return; + + for (int i = 0; i < q->nr; i++) { + struct diff_filepair *p = q->queue[i]; + if (ignore_match(p, o)) + diff_free_filepair(p); + else + diff_q(&outq, p); + } + + free(q->queue); + *q = outq; +} And ignore_match(p, o) will run xdl_diff() for file pair p. This approach ensures that the behavior of `git diff --raw` and `git diff --check` remains unaffected. > I'm not sure if stuff like --stat would need something similar. It's > already doing a content comparison, so presumably it handles it > internally. Maybe stuff like --dirstat would need it, too? In which case > we'd maybe want to annotate each filepair in an initial loop with > whether it's modified at the content-level, and then take that into > account in various code paths. > > And of course it's horribly hacky looking. Some refactoring might help. > Certainly it is silly to open /dev/null each time through the loop. > There might also be a better way of checking whether the diff found > anything than the found_changes flag. > > So this is really just sketching out the direction, and somebody would > need to figure out the details. Seems like compute_diffstat() will run xdl_diff() to fill diffstat. Since diff_flush() calls compute_diffstat() for --stat, --dirstat=lines, --shortstat and --namestat, we shouldn’t run an extra xdl_diff() for them. I don’t know if --dirstat=files would need an extra xdl_diff() though. >> * Also, should we internally run diff twice, especially even when >> we are going to show the patch output and are not limited to >> FORMAT_NAME and FORMAT_NAME_STATUS? Generally, running the real >> diff in any of the diffcore transformatin is a sign of trouble. > > The patch above also runs the diff twice for "-Ifoo --name-only -p". But > I think we are kind of stuck there. We want to show all name-only > entries before any content diffs. So either we have to run the content > diff twice, or we have to buffer it to show after we decide whether to > show name-only entries. I haven’t thought of a good way to avoid running the diff twice either. Caching the diff content seems quite complicated. Moreover, `git diff -G<regex> -p` requires running the diff twice: the first diff is used to filter out file pairs that don’t match, and the second diff outputs the patch. On Tue, Jul 29, 2025 at 05:28:00PM -0700, Junio C Hamano wrote: > > The enthusiasm is appreciated, but the implementation raises two > questions. > > * This special cases -I<pattern>, but any option that causes us to > set the .diff_from_contents flag, not just -I<pattern>, can cause > the raw blob comparison to be potentially different from what the > blob contents are compared with various "ignore this class of > changes" criteria. Shouldn't "git diff -w --name-status" and the > like get the same treatment? Yes, I think I could modify ignore_match() to support -w and other ignore options. > Also, the usual way to compose a log message of this project is to > > - Give an observation on how the current system works in the > present tense (so no need to say "Currently X is Y", or > "Previously X was Y" to describe the state before your change; > just "X is Y" is enough), and discuss what you perceive as a > problem in it. > > - Propose a solution (optional---often, problem description > trivially leads to an obvious solution in reader's minds). > > - Give commands to somebody editing the codebase to "make it so", > instead of saying "This commit does X". > > in this order. Thank you once again for patiently explaining how to write proper log messages. I’ve made notes and am ready to apply them to future commits. Thanks, Lidong