Re: [PATCH v2] diff: ensure consistent diff behavior with -I<regex> across output formats

Junio C Hamano <gitster@xxxxxxxxx> · Sun, 03 Aug 2025 21:36:02 -0700

Lidong Yan <yldhome2d2@xxxxxxxxx> writes:

>> I do not quite get why ignore_match() has to know so much about how
>> the real code in diff.c that implements -I<regex> works, compared to
>> the illustration of "here is how to do it" Peff posted, though.  It
>> somehow feels too much duplicated code.
>
> I did copy some code from diffcore-pickaxe.c. I will use Peff's code in the
> next patch and try to refactor diff_flush() to make the code simpler. Though
> the reason I match the regular expression in ignore_match() is that I want to
> return early as soon as an unmatched change is found. And indeed, it's not
> worth writing the duplicated code for this unknown performance benefit.

In the production code, it would be truly worth doing the
optimization; we want to avoid running diff twice if we can.

But I think the refactoring of diff_flush() codepath would may
involve some new mode (perhaps DIFF_FORMAT_DRYRUN or something) that

 (1) does not produce any output, like DIFF_FORMAT_NO_OUTPUT, so
     that we do not need to play with /dev/null like Peff's
     illustration.

 (2) knows that the caller is only interested in each path having
     any change worth reporting, so that it can short-circuit once a
     change is found for each path.

So, just before you want to decide showing name or name-status,
you'd do this extra diff_flush() that is run only to learn if each
path has changes (with various "ignore" criteria) in the dry-run
mode, and it can do as much short-cut as it needs to.

Hmm?