On 2025-08-27 at 19:08:16, Eric Wong wrote: > "brian m. carlson" <sandals@xxxxxxxxxxxxxxxxxxxx> wrote: > > TL;DR: We need a different datastore than a flat file for storing > > mappings between SHA-1 and SHA-256 in compatibility mode. Advice and > > opinions sought. > > <snip> > > > Our approach for mapping object IDs between algorithms uses data in pack > > index v3 (outlined in the transition document), plus a flat file called > > `loose-object-idx` for loose objects. However, we didn't anticipate > > that we'd need to handle mappings long-term for data that is neither a > > loose object nor a packed object. > > > > For instance, with shallow clones, we must store a mapping for the > > shallows the server has sent us[1], since we lack the history to convert > > objects otherwise. Similarly, if there are submodules or we're using a > > partial clone, we must store those mappings as well, since we cannot > > convert trees without them. We can store them in the > > `loose-object-idx`, but since it's not sorted or easily searchable, it's > > going to perform really terribly when we store enough of them. Right > > now, we read the entire file into two hashmaps (one in each direction) > > and we sometimes need to re-read it when other processes add items, so > > it won't take much to make it be slow and take a lot of memory. > > This really seems ideal for SQLite, which has come a long way > since 2005 when git started. > > I really wish git would've relied on more on existing formats > (e.g. LMDB refs) rather than introducing more one-off data > formats that require more cognitive overhead to document and > learn[1], especially when SQLite is extremely portable and works > on tiny devices. SQLite is not an option because it performs poorly with Java and we want our formats to work with other implementations, like JGit. That's why we created reftable instead of using SQLite. Also, in general, I'm not interested in being tied to a single implementation. If the developers of SQLite decide to dramatically change the license of all their code like Oracle did with Berkeley DB, we're going to have a problem. Yes, we can use the older versions, but we'd still need people to maintain the library and update it. -- brian m. carlson (they/them) Toronto, Ontario, CA
Attachment:
signature.asc
Description: PGP signature