Re: [PATCH] Fix buffer underflow in xdl_build_script

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 24/05/2025 10:08, René Scharfe wrote:
Am 24.05.25 um 07:57 schrieb René Scharfe:
Am 23.05.25 um 22:51 schrieb Alex via GitGitGadget:
From: jinyaoguo <guo846@xxxxxxxxxx>

The loop in xdl_build_script used `i1 >= 0 || i2 >= 0`, causing
`i1` (or `i2`) to reach 0 and then access `rchg1[i1-1]` (or
`rchg2[i2-1]`), which underflows the buffer.
This commit adds explicit `i1 > 0` and `i2 > 0` checks around
those array accesses to prevent invalid negative indexing.

xdl_prepare_ctx() in xdiff/xprepare.c allocates an extra entry at both
ends for rchg arrays, so an index of -1 should be within the bounds.

and rchg[-1] == 0 so i1 and i2 can never drop below -1

i1 and i2 are decreased in lockstep, though, so one of them can become
smaller than -1 if nrec is different between the files.  And that's how
this code run can indeed run off into the weeds.

Actually no, i1 can't seem to reach 0 without i2 also being 0 and vice
versa.  Or can it?  It makes sense that we reach the start of both
buffers at the same time if we walk backwards from the end, don't
misstep and have consistent rchg array contents, but I'm not sure.
The code looks like

for (i1 = xe->xdf1.nrec, i2 = xe->xdf2.nrec; i1 >= 0 || i2 >= 0; i1--, i2--)
		if (rchg1[i1 - 1] || rchg2[i2 - 1]) {
			for (l1 = i1; rchg1[i1 - 1]; i1--);
			for (l2 = i2; rchg2[i2 - 1]; i2--);

I think I've convinced myself that it is safe assuming there are an equal number of unchanged lines in rchg1 and rchg2 and rchg1[-1] == rchg2[-1] == 0. Each iteration consumes any changed lines in the preimage and the postimage plus a single context line from each (apart from the final iteration when the context lines may have been exhausted). At the start of the last iteration there are three possibilities

 - i1 == -1 && i2 >= 0 => there are insertions as the start of the file.
   As the context lines in the preimage have been exhausted all the
   remaining rchg2 elements represent added lines and are consumed by
   for (l2 = i2; rchg2[i2 - 1]; i2--) so at the end of the loop body
   i2 == 0 and the outer loop will exit.

 - i1 >= 0 && i2 == -1 => there are deletions at the start of the file.
   As the context lines in the postimage have been exhausted all the
   remaining rchg1 elements represent deleted lines and are consumed by
   for (l1 = i1; rchg1[i1 - 1]; i1--) so at the end of the loop body
   i1 == 0 and the outer loop will exit.

 - i1 >= 0 && i2 >= 0 => the first line is unchanged or there are
   insertions and deletions at the beginning of the file. At the end of
   the loop body i1 == 0 && i2 == 0 and the outer loop will exit.

We could add

    if (i1 < -1 || i2 < -1)
        BUG("mismatched context line count");

before "if (rchg1[i1 - 1] || rchg2[i2 - 1])" inside the loop if we're worried about bugs that break the assumption that there are equal numbers of context lines on each side. Any such bug would generate invalid diffs. I don't know how likely that is to happen in practice.

Best Wishes

Phillip


Are you able to demonstrate any out-of-bounds access with e.g.,
Valgrind, AddressSanitizer or an assertion?

Curiously, AddressSanitizer doesn't report anything, but if I add the
following line after the outer for, I can trigger it to report a
heap-buffer-overflow with e.g., git show 8613c2bb6c:

	if (i1 < 0 || i2 < 0) fprintf(stderr, "Oops: %ld %ld\n", i1, i2);

That's because I forgot to add braces.  D'oh!  I can't trigger any
out-of-bounds access or that Oops with them properly in place.  So I
let myself get fooled by a daring coding style. :-|



Signed-off-by: Alex Guo <alexguo1023@xxxxxxxxx>
---
     Fix buffer underflow in xdl_build_script

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1976%2Fmugitya03%2Fbuf-1-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1976/mugitya03/buf-1-v1
Pull-Request: https://github.com/git/git/pull/1976

  xdiff/xdiffi.c | 7 ++++---
  1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c
index 5a96e36dfbe..2e983965328 100644
--- a/xdiff/xdiffi.c
+++ b/xdiff/xdiffi.c
@@ -951,9 +951,10 @@ int xdl_build_script(xdfenv_t *xe, xdchange_t **xscr) {
  	 * Trivial. Collects "groups" of changes and creates an edit script.

Trivial for Davide perhaps (libxdiff author), but not my mushy brain..

  	 */
  	for (i1 = xe->xdf1.nrec, i2 = xe->xdf2.nrec; i1 >= 0 || i2 >= 0; i1--, i2--)

Should the || be a && instead?  From a birds-eye view I would assume we
can stop scanning for changes when we exhaust (reach the top) of either
side.  We just have to make sure everything from the other side is
accounted for in the last added change.

-		if (rchg1[i1 - 1] || rchg2[i2 - 1]) {
-			for (l1 = i1; rchg1[i1 - 1]; i1--);
-			for (l2 = i2; rchg2[i2 - 1]; i2--);
+		if ((i1 > 0 && rchg1[i1 - 1]) ||
+			(i2 > 0 && rchg2[i2 - 1])) {
+			for (l1 = i1; i1 > 0 && rchg1[i1 - 1]; i1--);
+            for (l2 = i2; i2 > 0 && rchg2[i2 - 1]; i2--);

Nit: The indentation of that line is off.

if (!(xch = xdl_add_change(cscr, i1, i2, l1 - i1, l2 - i2))) {
  				xdl_free_script(cscr);

base-commit: 8613c2bb6cd16ef530dc5dd74d3b818a1ccbf1c0




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux