Re: [PATCH v3] diff: check range before dereferencing an array element

Jeff King <peff@xxxxxxxx> · Tue, 29 Apr 2025 19:33:14 -0400

On Tue, Apr 29, 2025 at 03:37:35PM -0700, Junio C Hamano wrote:

> Jeff King <peff@xxxxxxxx> writes:
> 
> >> @@ -892,7 +892,7 @@ static void fill_es_indent_data(struct emitted_diff_symbol *es)
> >>  
> >>  	/* skip any \v \f \r at start of indentation */
> >>  	while (s[off] == '\f' || s[off] == '\v' ||
> >> -	       (s[off] == '\r' && off < len - 1))
> >> +	       (off < len - 1 && s[off] == '\r'))
> >>  		off++;
> >
> > ...since the same pattern exists for the other s[off] checks, is it
> > worth future-proofing this like:
> >
> >   while (off < len - 1 &&
> >          (s[off] == '\f' || s[off] == '\v' || s[off] == '\r')
> >
> > ?
> 
> But doesn't it change the semantics?
> 
> s[off] == '\f', even if off is at the end of the string, i.e. (off
> == len - 1), must trigger the off++ increment.
> 
> On the other hand, CR that is the part of CRLF at the end of line is
> *not* treated like other funny whitespace control characters.  This
> "is off not at the end of line, if so check CR" comparison is about
> that.

Ah, you're right. I was reading the offset check as "are we past the end
of string" (guided by CodeQL's complaint), and if that were the case the
logic would apply equally to all values we are checking.

But that is not what is going on at all. The offset check is for "len -
1", and so is "do not do this one CR match for the final character of
the string". And thus applying it elsewhere is wrong.

And CodeQL's false positive is doubly wrong. We do not even need to say
"the string is NUL-terminated, so it is OK in this case to look past the
end-of-string". The check is not even a string bounds check at all.

-Peff