Lidong Yan <yldhome2d2@xxxxxxxxx> writes: > In git-diff, options like `-w` and `-I<regex>` require comparing > file contents to determine whether two files are the same, even when > their SHA values differ. For options like `--raw`, `--name-status`, > and `--name-only`, git-diff deliberately compares only the SHA values > to determine whether two files are the same, for performance reasons. > As a result, a file shown in `git diff --name-status` may not appear > in `git diff --patch`. > > To quickly determine whether two files are identical, Add helper > function diff_flush_patch_quiet() in diff.c. Add `.diff_optimize` > field in `struct diff_options`. When `.diff_optimize` is set to > `DIFF_OPT_DRY_RUN`, builtin_diff() will return immediately upon > detecting any change. Call diff_flush_patch_quiet() to determine > if we should flush `--raw`, `--name-only` or `--name-status` output. > > Signed-off-by: Jeff King <peff@xxxxxxxx> > Signed-off-by: Lidong Yan <yldhome2d2@xxxxxxxxx> > --- > diff.c | 67 +++++++++++++++++++++++++++++--------- > diff.h | 5 +++ > t/t4013-diff-various.sh | 14 ++++++++ > t/t4015-diff-whitespace.sh | 2 +- > xdiff-interface.h | 6 ++-- > 5 files changed, 74 insertions(+), 20 deletions(-) The code looks much easier to reason about than the previous rounds. A few comments about the design. - Are there other possible values that might fit in this "optimize" member, and what kind of behaviour would they trigger, that we can envision? I do not think of any and that is why the "enum diff_optimize" member in the diff_options structure smells more like a "bool dry_run". By the way, giving a member "diff_" prefix when the enclosing struct is clearly about "diff" by having a name "diff_options" is often a waste of readers' time. - It is unclear why the dry-run need to imply 0-line context. - diff_flush_patch_quietly() would be a better name for diff_flush_patch_quiet(). On to the details. > diff --git a/diff.c b/diff.c > index dca87e164f..5254ef9373 100644 > --- a/diff.c > +++ b/diff.c > @@ -2444,6 +2444,15 @@ static int fn_out_consume(void *priv, char *line, unsigned long len) > return 0; > } > > +static int quick_consume(void *priv, char *line, unsigned long len) > +{ > + struct emit_callback *ecbdata = priv; > + struct diff_options *o = ecbdata->opt; > + > + o->found_changes = 1; > + return 1; > +} OK, as a non-zero return value from consume callbacks is supposed to signal an error and causes xdiff_outf() an early return, this serves as a short-cut. One downside is that we cannot truly notice and signal an error to our callers, as we will see in a later hunk. > @@ -3709,6 +3718,7 @@ static void builtin_diff(const char *name_a, > xdemitconf_t xecfg; > struct emit_callback ecbdata; > const struct userdiff_funcname *pe; > + int dry_run = o->diff_optimize == DIFF_OPT_DRY_RUN; And this screams that o->dry_run that is a Boolean may be sufficient, but I could be missing obvious future enhancement opportunities. > @@ -3741,8 +3751,8 @@ static void builtin_diff(const char *name_a, > xpp.ignore_regex_nr = o->ignore_regex_nr; > xpp.anchors = o->anchors; > xpp.anchors_nr = o->anchors_nr; > - xecfg.ctxlen = o->context; > - xecfg.interhunkctxlen = o->interhunkcontext; > + xecfg.ctxlen = dry_run ? 0 : o->context; > + xecfg.interhunkctxlen = dry_run ? 0 : o->interhunkcontext; Unclear why. I think you had a similar change with a comment ... > @@ -3750,7 +3760,8 @@ static void builtin_diff(const char *name_a, > xdiff_set_find_func(&xecfg, pe->pattern, pe->cflags); > > diffopts = getenv("GIT_DIFF_OPTS"); > - if (!diffopts) > + /* ignore ctxlen if we are in dry run mode */ ... here, but this comment is totally useless. > + if (!diffopts || dry_run) > ; > else if (skip_prefix(diffopts, "--unified=", &v)) > xecfg.ctxlen = strtoul(v, NULL, 10); Anybody who can read the code can tell that dry_run disables the xecfg.ctxlen handling we have below. What the code does not tell, hence the author of a patch must help the readers by writing *why* ignoring patch context is safe, correct, necessary, and desirable when the main non-dry-run invocation of the diff machinery, which this dry-run mode is trying to help, may use some context lines. It does not have to be done in in-code comment. Especially because the consequence of the design decision to "ignore context" appears in two different places, the proposed log message would be a better place to explain why it is a safe, correct, necessary and desirable thing to do. > - if (xdi_diff_outf(&mf1, &mf2, NULL, fn_out_consume, > - &ecbdata, &xpp, &xecfg)) > + if (dry_run) > + xdi_diff_outf(&mf1, &mf2, NULL, quick_consume, > + &ecbdata, &xpp, &xecfg); > + else if (xdi_diff_outf(&mf1, &mf2, NULL, fn_out_consume, > + &ecbdata, &xpp, &xecfg)) > die("unable to generate diff for %s", one->path); And this is the consequence of xdi_diff_outf() not having an extensible way to report different "abnormal" conditions to its caller. In the normal case, all exceptional/abnormal conditions are error. But in the dry-run case, "abnormal return is an error and we should die" would not fit our purpose. "Assume that non-zero 'success' return is because our quick_consume() is signalling that it found what it needs to find, and that is not an error" is what is going on here. I'll stop here for now. Will continue later. Thanks.