Re: [PATCH] git-compat-util: introduce `count_t` typedef

Phillip Wood <phillip.wood123@xxxxxxxxx> · Thu, 7 Aug 2025 15:17:14 +0100

Hi Patrick

On 07/08/2025 10:22, Patrick Steinhardt wrote:
Historically, Git has been very lenient with its use of integer types
and didn't really give much thought into which type to use in what
situation. We interchangeably mix and match signed and unsigned types
and often times blindly convert them. This use has led to several
out-of-bounds reads and writes in the past, some of which could be
turned into arbitrary code execution.

My feeling is that one of the main problems has been using different 
types for loop indexes and loop limits. If we mandated that the loop 
index had to be the same type as the limit that would improve things 
considerably and without mandating a particular type.

A discussion that regularly comes up in this context though is what
types to use for counting entities:

   - One question is whether the type should be signed or unsigned.
     Arguably, the answer should be to use unsigned types as long as we
     know that we never need a negative value, e.g. as a sentinel. This
     helps guide the reader and explicitly conveys the sense that such a
     counter is only ever going to be a non-negative number. Otherwise,
     code would need to be more careful as it may hold negative values.

The counter argument to this is that it is easy to write incorrect loops 
when counting down if the loop variable is unsigned. Using a typedef 
that hides the actual type makes that harder to spot as it is not 
immediately obvious whether the loop index is signed or not. As we have 
cases that do need to store a negative value then we're still left with 
using a mix of signed and unsigned types for counting in our code base.

Introduce a new typedef for `count_t` that is of type `uintptr_t` to
give clear guidance what type to use for counting entities. This type
was chosen because in the worst case, an entity may be a single byte and
we fill all of our memory with these entities. As `uintptr_t` is
guaranteed to hold at least the value of a pointer, we know that it
could be used to index into every single such entity.

How many sites actually allocate anything like that number of 
entities?Generally we use ALLOC_GROW() or ALLOC_GROW_BY() which means 
that we're not normally counting bytes. ALLOC_GROW_BY() assumes the 
number of entities fits into a size_t so should be be changing that to 
use count_t? If we're worried about overflows then maybe we should look 
at alloc_nr() which calculates the new allocation with

    (nr + 16) * 3 / 2

which which will start overflowing long before we starting allocating 
UINTPTR_MAX single byte entities.

Thanks

Phillip