Hi Patrick
On 07/08/2025 10:22, Patrick Steinhardt wrote:
Historically, Git has been very lenient with its use of integer types
and didn't really give much thought into which type to use in what
situation. We interchangeably mix and match signed and unsigned types
and often times blindly convert them. This use has led to several
out-of-bounds reads and writes in the past, some of which could be
turned into arbitrary code execution.
My feeling is that one of the main problems has been using different
types for loop indexes and loop limits. If we mandated that the loop
index had to be the same type as the limit that would improve things
considerably and without mandating a particular type.
A discussion that regularly comes up in this context though is what
types to use for counting entities:
- One question is whether the type should be signed or unsigned.
Arguably, the answer should be to use unsigned types as long as we
know that we never need a negative value, e.g. as a sentinel. This
helps guide the reader and explicitly conveys the sense that such a
counter is only ever going to be a non-negative number. Otherwise,
code would need to be more careful as it may hold negative values.
The counter argument to this is that it is easy to write incorrect loops
when counting down if the loop variable is unsigned. Using a typedef
that hides the actual type makes that harder to spot as it is not
immediately obvious whether the loop index is signed or not. As we have
cases that do need to store a negative value then we're still left with
using a mix of signed and unsigned types for counting in our code base.
Introduce a new typedef for `count_t` that is of type `uintptr_t` to
give clear guidance what type to use for counting entities. This type
was chosen because in the worst case, an entity may be a single byte and
we fill all of our memory with these entities. As `uintptr_t` is
guaranteed to hold at least the value of a pointer, we know that it
could be used to index into every single such entity.
How many sites actually allocate anything like that number of
entities?Generally we use ALLOC_GROW() or ALLOC_GROW_BY() which means
that we're not normally counting bytes. ALLOC_GROW_BY() assumes the
number of entities fits into a size_t so should be be changing that to
use count_t? If we're worried about overflows then maybe we should look
at alloc_nr() which calculates the new allocation with
(nr + 16) * 3 / 2
which which will start overflowing long before we starting allocating
UINTPTR_MAX single byte entities.
Thanks
Phillip