Jon Forrest <nobozo@xxxxxxxxx> writes: > P.S. I'm trying to read the Git source code to get a better handle > on what actually goes on in the index but this is taking some time. Depending on the style of the learner, I often recommend reading the very initial revision of Git, i.e. e83c5163 (Initial revision of "git", the information manager from hell, 2005-04-07), to quickly get a feel of what various pieces there are and how they fit together, by doing $ git checkout -b initial e83c5163316f89bfb This would give you a mere 1244 lines spread across 11 files, which is something that can be read from cover to cover in a single sitting and see how various data structures relate to each other and interact. In the past 20 years, we of course have added features and auxiliary data structures, and the various details of the implementation have changed, but the really core part of the concept haven't drifted too far from the original. For example, the fact that the index is first read into core, each entry is represented as a cache_entry in-core structure, and the code accesses them via an array active_cache[], and that array is sorted per pathnames, haven't changed. In the 3-4 months that followed that initial revision, we added higher-stage entries that are used to represent a merge in progress (together with sorting rules for them), and later we added prefix-compression for the pathnames, but the basic structure of the index subsystem hasn't changed all that much over the years.