Re: [PATCH 00/17] object-store: carve out the object database subsystem

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, May 07, 2025 at 10:02:12AM -0700, Junio C Hamano wrote:
> Derrick Stolee <stolee@xxxxxxxxx> writes:
> 
> > Patches 1 and 2 involve renaming some core structures, and I had
> > some questions around these names (since we hope to be stuck with
> > the new names for a long time). I was thinking out loud on a per-
> > patch basis, but now want to collect my thoughts around these:
> >
> >  * raw_object_store currently describes the abstraction that contains
> >    all objects that can be accessed within the repository. This may
> >    include multiple alternates. Patch 1 renames this to
> >    'object_database'.
> >
> >  * object_directory currently describes a single directory that
> >    has the same structure as $GIT_DIR/objects/ but may be an alternate
> >    or a submodule object directory. Patch 2 renames this to
> >    'odb_backend'.
> >
> > My concerns around this are basically around not liking "backend" for
> > this purpose. When I think of a backend, I'm thinking about the
> > implementation details (like the refs backend being files or reftable)
> > and not multiple distinct locations that have their own objects.
> 
> Yup, odb_backend_files (aot odb_backend_redis) or something?

Yeah, that was my vision indeed. I think it works equally well though in
case we name this `odb_alternate`. The benefit of the "alternate"
terminology is that we already use it and it's almost a perfect fit, and
it gives the reader a hint that we may have multiple alternates. On the
other hand, `odb_backend` sounds as if there would only be a single
backend for a `struct object_database`.

So Stolee caused me to reconsider and favor `odb_alternate`. But in the
end I guess that both names would work alright.

> >  * 'struct object_directory' could be renamed to 'struct odb_shard' or
> >    'struct odb_slice' or similar. I may even recommend 'odb_partition'
> >    though that does imply some disjointness that is not guaranteed (an
> >    object can exist in multiple parts).
> >
> >  * In the event that we create multiple implementations for storing
> >    objects, then a 'struct odb_shard' could point to a backend to help
> >    find the appropriate methods for interacting with its storage.
> 
> Hmph, I do not have strong opinions, but I consider it an
> implementation detail of one particular backend, namely, the
> filesystem based backend, that it can link together multiple
> object_directory instances and present them as if they form a single
> object database, just like all files within a single object_directory
> form an illusion of a single object database (aka key-value store) even
> though some objects are stored in individual loose object files while
> many others are packed in a single packfile.
> 
> I did not expect you would want to go to the world where a single
> "shard" consists of an object_directory backed by the filesystem and
> some other more database-y backend.  It is an interesting idea, but
> we'd need to worry about many things we do not have to worry about
> right now.  E.g. what do the precedence rules among different
> components within a single "shard" look like?  How do we express "in
> this repository, local filesystem-backed piece is consulted first,
> and then check this piece backed by low-cost but high-latency
> storage backend"?

Well, in fact I want to design this from the start so that you can mix
and match different backends. I think it falls out naturally from the
design if an alternate can be backed by anything, and it has a lot of
very interesting features.

Furthermore, it would cause a bunch of problems if we _didn't_ allow for
this, at least for hosting providers:

  - Migrations would now need to be atomic across fork networks where
    all forks need to be migrated at once so that we don't mix backends.

  - Migrations in general would be a pain if we had to do an atomic
    migration even for a single object directory. With mixed backends we
    can already make a partially-migrated backend available while the
    old backend is still in use.

  - High-latency storage backends may work well for binary files, but
    not for smallish text files. 

This all of course still needs to be hashed out. I do want to send an
RFC document to the mailing list soonish, probably in the first half of
the Git 2.51 release cycle, so that we can discuss where to go.

> > I do mention that the rename of the object-store.[c|h] files may be
> > unnecessary, or perhaps could be delayed until this series is merged
> > and the collateral is calmed.
> 
> Right now, merge-fix needed against all other topics in flight look
> like this, in order to merge it to 'seen'.

Okay. In that case I'll keep that patch for now.

Patrick




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux