Re: [PATCH v2] diff: ensure consistent diff behavior with -I<regex> across output formats

Lidong Yan <yldhome2d2@xxxxxxxxx> · Tue, 5 Aug 2025 17:23:38 +0800

Junio C Hamano <gitster@xxxxxxxxx> writes:
> 
> But I think the refactoring of diff_flush() codepath would may
> involve some new mode (perhaps DIFF_FORMAT_DRYRUN or something) that
> 
> (1) does not produce any output, like DIFF_FORMAT_NO_OUTPUT, so
>     that we do not need to play with /dev/null like Peff's
>     illustration.
> 
> (2) knows that the caller is only interested in each path having
>     any change worth reporting, so that it can short-circuit once a
>     change is found for each path.
> 
> So, just before you want to decide showing name or name-status,
> you'd do this extra diff_flush() that is run only to learn if each
> path has changes (with various "ignore" criteria) in the dry-run
> mode, and it can do as much short-cut as it needs to.

I’m proposing to add a .diff_optimize field to struct diff_options, which
would support three modes: DIFF_OPT_NONE, DIFF_OPT_DRY_RUN,
and DIFF_OPT_BUFFER. The appropriate value would be determined
before calling diff_flush(), potentially in repo_diff_setup().

DIFF_OPT_NONE will be the code Peff provide, DIFF_OPT_DRY_RUN
will optimize for --quiet, --name, --name-status, etc, so that we can return
early if we found any change. DIFF_OPT_BUFFER will first emit changes
and context around changes into a buffer (so there would be a map from file
pair to change buffer), then operations after the buffer is built will use the
buffer instead of calling xdl_diff().

However, I’m concerned that DIFF_OPT_BUFFER could lead to high memory
usage in Git, and I’m not entirely sure if this trade-off is justified.

Thanks,
Lidong