On 24/05/2025 14:38, Phillip Wood wrote:
On 24/05/2025 10:08, René Scharfe wrote:
Am 24.05.25 um 07:57 schrieb René Scharfe:
Am 23.05.25 um 22:51 schrieb Alex via GitGitGadget:
From: jinyaoguo <guo846@xxxxxxxxxx>
The loop in xdl_build_script used `i1 >= 0 || i2 >= 0`, causing
`i1` (or `i2`) to reach 0 and then access `rchg1[i1-1]` (or
`rchg2[i2-1]`), which underflows the buffer.
This commit adds explicit `i1 > 0` and `i2 > 0` checks around
those array accesses to prevent invalid negative indexing.
xdl_prepare_ctx() in xdiff/xprepare.c allocates an extra entry at both
ends for rchg arrays, so an index of -1 should be within the bounds.
and rchg[-1] == 0 so i1 and i2 can never drop below -1
i1 and i2 are decreased in lockstep, though, so one of them can become
smaller than -1 if nrec is different between the files. And that's how
this code run can indeed run off into the weeds.
Actually no, i1 can't seem to reach 0 without i2 also being 0 and vice
versa. Or can it? It makes sense that we reach the start of both
buffers at the same time if we walk backwards from the end, don't
misstep and have consistent rchg array contents, but I'm not sure.
The code looks like
for (i1 = xe->xdf1.nrec, i2 = xe->xdf2.nrec; i1 >= 0 || i2 >= 0;
i1--, i2--)
if (rchg1[i1 - 1] || rchg2[i2 - 1]) {
for (l1 = i1; rchg1[i1 - 1]; i1--);
for (l2 = i2; rchg2[i2 - 1]; i2--);
I think I've convinced myself that it is safe assuming there are an
equal number of unchanged lines in rchg1 and rchg2 and rchg1[-1] ==
rchg2[-1] == 0. Each iteration consumes any changed lines in the
preimage and the postimage plus a single context line from each (apart
from the final iteration when the context lines may have been
exhausted). At the start of the last iteration there are three
possibilities
- i1 == -1 && i2 >= 0 => there are insertions as the start of the file.
Sorry that should be i1 == 0 && i2 >= 0
As the context lines in the preimage have been exhausted all the
remaining rchg2 elements represent added lines and are consumed by
for (l2 = i2; rchg2[i2 - 1]; i2--) so at the end of the loop body
i2 == 0 and the outer loop will exit.
- i1 >= 0 && i2 == -1 => there are deletions at the start of the file.
This one should be i1 >= 0 && i2 == 0
As the context lines in the postimage have been exhausted all the
remaining rchg1 elements represent deleted lines and are consumed by
for (l1 = i1; rchg1[i1 - 1]; i1--) so at the end of the loop body
i1 == 0 and the outer loop will exit.
- i1 >= 0 && i2 >= 0 => the first line is unchanged or there are
insertions and deletions at the beginning of the file. At the end of
the loop body i1 == 0 && i2 == 0 and the outer loop will exit.
We could add
if (i1 < -1 || i2 < -1)
BUG("mismatched context line count");
before "if (rchg1[i1 - 1] || rchg2[i2 - 1])" inside the loop if we're
worried about bugs that break the assumption that there are equal
numbers of context lines on each side. Any such bug would generate
invalid diffs. I don't know how likely that is to happen in practice.
Best Wishes
Phillip
Are you able to demonstrate any out-of-bounds access with e.g.,
Valgrind, AddressSanitizer or an assertion?
Curiously, AddressSanitizer doesn't report anything, but if I add the
following line after the outer for, I can trigger it to report a
heap-buffer-overflow with e.g., git show 8613c2bb6c:
if (i1 < 0 || i2 < 0) fprintf(stderr, "Oops: %ld %ld\n", i1, i2);
That's because I forgot to add braces. D'oh! I can't trigger any
out-of-bounds access or that Oops with them properly in place. So I
let myself get fooled by a daring coding style. :-|
Signed-off-by: Alex Guo <alexguo1023@xxxxxxxxx>
---
Fix buffer underflow in xdl_build_script
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-
git-1976%2Fmugitya03%2Fbuf-1-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-
git-1976/mugitya03/buf-1-v1
Pull-Request: https://github.com/git/git/pull/1976
xdiff/xdiffi.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c
index 5a96e36dfbe..2e983965328 100644
--- a/xdiff/xdiffi.c
+++ b/xdiff/xdiffi.c
@@ -951,9 +951,10 @@ int xdl_build_script(xdfenv_t *xe, xdchange_t
**xscr) {
* Trivial. Collects "groups" of changes and creates an edit
script.
Trivial for Davide perhaps (libxdiff author), but not my mushy brain..
*/
for (i1 = xe->xdf1.nrec, i2 = xe->xdf2.nrec; i1 >= 0 || i2 >=
0; i1--, i2--)
Should the || be a && instead? From a birds-eye view I would assume we
can stop scanning for changes when we exhaust (reach the top) of either
side. We just have to make sure everything from the other side is
accounted for in the last added change.
- if (rchg1[i1 - 1] || rchg2[i2 - 1]) {
- for (l1 = i1; rchg1[i1 - 1]; i1--);
- for (l2 = i2; rchg2[i2 - 1]; i2--);
+ if ((i1 > 0 && rchg1[i1 - 1]) ||
+ (i2 > 0 && rchg2[i2 - 1])) {
+ for (l1 = i1; i1 > 0 && rchg1[i1 - 1]; i1--);
+ for (l2 = i2; i2 > 0 && rchg2[i2 - 1]; i2--);
Nit: The indentation of that line is off.
if (!(xch = xdl_add_change(cscr, i1, i2, l1 - i1, l2 -
i2))) {
xdl_free_script(cscr);
base-commit: 8613c2bb6cd16ef530dc5dd74d3b818a1ccbf1c0