Re: Question About Sorting the Index

Junio C Hamano <gitster@xxxxxxxxx> · Sat, 17 May 2025 11:36:39 -0700

Jon Forrest <nobozo@xxxxxxxxx> writes:

> P.S. I'm trying to read the Git source code to get a better handle
> on what actually goes on in the index but this is taking some time.

Depending on the style of the learner, I often recommend reading the
very initial revision of Git, i.e.  e83c5163 (Initial revision of
"git", the information manager from hell, 2005-04-07), to quickly
get a feel of what various pieces there are and how they fit
together, by doing

    $ git checkout -b initial e83c5163316f89bfb

This would give you a mere 1244 lines spread across 11 files, which
is something that can be read from cover to cover in a single
sitting and see how various data structures relate to each other and
interact.  In the past 20 years, we of course have added features
and auxiliary data structures, and the various details of the
implementation have changed, but the really core part of the concept
haven't drifted too far from the original.

For example, the fact that the index is first read into core, each
entry is represented as a cache_entry in-core structure, and the
code accesses them via an array active_cache[], and that array is
sorted per pathnames, haven't changed.  In the 3-4 months that
followed that initial revision, we added higher-stage entries that
are used to represent a merge in progress (together with sorting
rules for them), and later we added prefix-compression for the
pathnames, but the basic structure of the index subsystem hasn't
changed all that much over the years.