Question About Sorting the Index

Jon Forrest <nobozo@xxxxxxxxx> · Fri, 16 May 2025 16:43:37 -0700

I've learned that entries in the index file "are
sorted in ascending order on the name field".

Am I right in thinking that this means that
every time a file is added to the index by
running "git add" the whole index file must
be resorted? If so, this seems like a lot of
work, especially since not all the entries
are the same size.

Has any thought been made about improving this,
such as perhaps having an "index index"? This
would be a separate file that contains the name
field of each entry, the location of where the entry
starts in the index, and the length of the entry.
I'll call this a partial index entry.
The "index index" would also be sorted by the name field.

With this approach, running "git add" would simply
append a full index entry to the index, and
append the partial entry to the "index index", which
would then be sorted. The full index would not be
sorted. I'm guessing this is the common path.

To delete a file from the index, I'd propose adding an
"deleted" bit to the full cache entry. When "git rm --cached"
is run, 2 things would happen:

1) The "deleted" bit would be turned on in the full index
entry for the file. The index itself will not be sorted.
Every so often, perhaps when "git fsck" is run, these
entries could be deleted. The full index won't have
to be resorted when this happens because it won't be
assumed to be in sorted order any longer.

2) The "index index" would be modified by removing the
partial entry for the file. This could be done by
writing the partial entries up to the entry being
deleted, and then the entries following. No sort would
be necessary because the "index index" is already sorted.

One drawback of this approach would be that since the "index index"
entries also won't be the same length, sorting it will still require
extra work. However, this wouldn't be any harder then sorting the full
index, and a lot less data wouldn't have to be moved around.

All this is so simple that I suspect that it's been considered before.
Am I missing something?

Cordially,
Jon Forrest

P.S. I'm trying to read the Git source code to get a better handle
on what actually goes on in the index but this is taking some time.