Hi Niels
On 25/04/2025 16:59, Niels Glodny wrote:
The cleanup_records function marks some lines as changed
before running the actual diff algorithm. For most lines,
this is a good performance optimization, but it also marks
lines that are surrounded by many changed lines as changed
as well. This can cause redundant changes and longer-than-
necessary diffs.
Nicely explained. We normally wrap commit messages at 72 characters
Whether this results in better-looking diffs is subjective.
However, the --minimal flag explicitly requests the shortest
possible diff.
Looking at the diff for 2ff58dec493 (refs: introduce function to batch
refname availability checks, 2025-03-12) I'd say it is definitely less
readable with this change but some of the changes in 320f2061b63 (t0602:
use subshell to ensure working directory unchanged, 2025-02-28) are more
readable. Anyway as you say if we promise to find the minimal diff then
that is what we should do.
The performance impact of this change is negligible, and it
results in shorter diffs in about 1.3% of diffs in Git's
history.
Have you got any numbers for the performance change?
I think the premise of this patch is sound, I've left a couple of
comments below as I think we can simplify the code changes.
diff --git a/t/t4071-diff-minimal.sh b/t/t4071-diff-minimal.sh
new file mode 100755
index 0000000000..3ad759dab4
--- /dev/null
+++ b/t/t4071-diff-minimal.sh
@@ -0,0 +1,16 @@
+#!/bin/sh
+
+test_description='minimal diff algorithm'
+
+. ./test-lib.sh
+
+test_expect_success 'minimal diff should not mark changes between changed lines' '
+ printf "x\nx\nx\nx\n" >pre &&
+ printf "x\nx\nx\nA\nB\nC\nD\nx\nE\nF\nG\n" >post &&
+ test_must_fail git diff --no-index \
+ --minimal pre post >diff &&
+ ! grep "+x" diff &&
+ ! grep "-x" diff
+'
Thanks for taking the trouble to add a new test. We have a few test
helper functions which we could use here.
test_write_lines x x x x >pre &&
test_write_lines x x x A B C D x E F G >post &&
test_expect_code 1 git diff --no-index --minimal pre post >diff &&
test_grep ! ^[-+]x diff
test_expect_code ensures that the non-zero exit code is from the files
not matching rather than another error like an invalid option or missing
file. test_grep will display the file if there are matches which helps
when debugging test failures.
> [...]> -static int xdl_cleanup_records(xdlclassifier_t *cf, xdfile_t
*xdf1, xdfile_t *xdf2) {
+static int xdl_cleanup_records(xdlclassifier_t *cf, xdfile_t *xdf1,
+ xdfile_t *xdf2, int need_min) {
long i, nm, nreff, mlim;
xrecord_t **recs;
xdlclass_t *rcrec;
@@ -379,7 +383,7 @@ static int xdl_cleanup_records(xdlclassifier_t *cf, xdfile_t *xdf1, xdfile_t *xd
xdlclassifier_t carries a copy of the flag that were interested in so I
think at the start of xdl_cleanup_records we can add
int need_min = !!(cf->flags & XDF_NEED_MINIMAL);
and then we don't need to change the signature.
for (i = xdf1->dstart, recs = &xdf1->recs[xdf1->dstart]; i <= xdf1->dend; i++, recs++) {
rcrec = cf->rcrecs[(*recs)->ha];
nm = rcrec ? rcrec->len2 : 0;
- dis1[i] = (nm == 0) ? 0: (nm >= mlim) ? 2: 1;
+ dis1[i] = (nm == 0) ? 0: (nm >= mlim && !need_min) ? 2: 1;
If we want a minimal diff then we force dis1[i] to be 1 so that this
line will never be marked as changed before searching for the longest
common sequence. Well spotted.
Best Wishes
Phillip
}
if ((mlim = xdl_bogosqrt(xdf2->nrec)) > XDL_MAX_EQLIMIT)
@@ -387,7 +391,7 @@ static int xdl_cleanup_records(xdlclassifier_t *cf, xdfile_t *xdf1, xdfile_t *xd
for (i = xdf2->dstart, recs = &xdf2->recs[xdf2->dstart]; i <= xdf2->dend; i++, recs++) {
rcrec = cf->rcrecs[(*recs)->ha];
nm = rcrec ? rcrec->len1 : 0;
- dis2[i] = (nm == 0) ? 0: (nm >= mlim) ? 2: 1;
+ dis2[i] = (nm == 0) ? 0: (nm >= mlim && !need_min) ? 2: 1;
}
for (nreff = 0, i = xdf1->dstart, recs = &xdf1->recs[xdf1->dstart];
@@ -449,10 +453,10 @@ static int xdl_trim_ends(xdfile_t *xdf1, xdfile_t *xdf2) {
}
-static int xdl_optimize_ctxs(xdlclassifier_t *cf, xdfile_t *xdf1, xdfile_t *xdf2) {
-
+static int xdl_optimize_ctxs(xdlclassifier_t *cf, xdfile_t *xdf1,
+ xdfile_t *xdf2, int need_min) {
if (xdl_trim_ends(xdf1, xdf2) < 0 ||
- xdl_cleanup_records(cf, xdf1, xdf2) < 0) {
+ xdl_cleanup_records(cf, xdf1, xdf2, need_min) < 0) {
return -1;
}
base-commit: f65182a99e545d2f2bc22e6c1c2da192133b16a3