On Wed, May 07, 2025 at 10:02:12AM -0700, Junio C Hamano wrote: > Derrick Stolee <stolee@xxxxxxxxx> writes: > > > Patches 1 and 2 involve renaming some core structures, and I had > > some questions around these names (since we hope to be stuck with > > the new names for a long time). I was thinking out loud on a per- > > patch basis, but now want to collect my thoughts around these: > > > > * raw_object_store currently describes the abstraction that contains > > all objects that can be accessed within the repository. This may > > include multiple alternates. Patch 1 renames this to > > 'object_database'. > > > > * object_directory currently describes a single directory that > > has the same structure as $GIT_DIR/objects/ but may be an alternate > > or a submodule object directory. Patch 2 renames this to > > 'odb_backend'. > > > > My concerns around this are basically around not liking "backend" for > > this purpose. When I think of a backend, I'm thinking about the > > implementation details (like the refs backend being files or reftable) > > and not multiple distinct locations that have their own objects. > > Yup, odb_backend_files (aot odb_backend_redis) or something? Yeah, that was my vision indeed. I think it works equally well though in case we name this `odb_alternate`. The benefit of the "alternate" terminology is that we already use it and it's almost a perfect fit, and it gives the reader a hint that we may have multiple alternates. On the other hand, `odb_backend` sounds as if there would only be a single backend for a `struct object_database`. So Stolee caused me to reconsider and favor `odb_alternate`. But in the end I guess that both names would work alright. > > * 'struct object_directory' could be renamed to 'struct odb_shard' or > > 'struct odb_slice' or similar. I may even recommend 'odb_partition' > > though that does imply some disjointness that is not guaranteed (an > > object can exist in multiple parts). > > > > * In the event that we create multiple implementations for storing > > objects, then a 'struct odb_shard' could point to a backend to help > > find the appropriate methods for interacting with its storage. > > Hmph, I do not have strong opinions, but I consider it an > implementation detail of one particular backend, namely, the > filesystem based backend, that it can link together multiple > object_directory instances and present them as if they form a single > object database, just like all files within a single object_directory > form an illusion of a single object database (aka key-value store) even > though some objects are stored in individual loose object files while > many others are packed in a single packfile. > > I did not expect you would want to go to the world where a single > "shard" consists of an object_directory backed by the filesystem and > some other more database-y backend. It is an interesting idea, but > we'd need to worry about many things we do not have to worry about > right now. E.g. what do the precedence rules among different > components within a single "shard" look like? How do we express "in > this repository, local filesystem-backed piece is consulted first, > and then check this piece backed by low-cost but high-latency > storage backend"? Well, in fact I want to design this from the start so that you can mix and match different backends. I think it falls out naturally from the design if an alternate can be backed by anything, and it has a lot of very interesting features. Furthermore, it would cause a bunch of problems if we _didn't_ allow for this, at least for hosting providers: - Migrations would now need to be atomic across fork networks where all forks need to be migrated at once so that we don't mix backends. - Migrations in general would be a pain if we had to do an atomic migration even for a single object directory. With mixed backends we can already make a partially-migrated backend available while the old backend is still in use. - High-latency storage backends may work well for binary files, but not for smallish text files. This all of course still needs to be hashed out. I do want to send an RFC document to the mailing list soonish, probably in the first half of the Git 2.51 release cycle, so that we can discuss where to go. > > I do mention that the rename of the object-store.[c|h] files may be > > unnecessary, or perhaps could be delayed until this series is merged > > and the collateral is calmed. > > Right now, merge-fix needed against all other topics in flight look > like this, in order to merge it to 'seen'. Okay. In that case I'll keep that patch for now. Patrick