On Fri, Apr 11, 2025 at 03:58:03PM -0700, Junio C Hamano wrote: > Patrick Steinhardt <ps@xxxxxx> writes: > > > Cached objects are virtual objects that can be set up without writing > > anything into the object store directly. This mechanism for example > > allows us to create fake commits in git-blame(1). > > > > The cached objects are stored in a global variable. Refactor the code so > > that we instead store the array as part of the raw object store. This is > > another step into the direction of libifying our object database. > > While we do need some execution context object to hang these virtual > objects, once we decide that it cannot be global, I am not sure if > epository objects are good home for them. If your application > running in a repository needs to give one object name to a virtual > object, and then that same application wants to access a submodule > of that repository in the same process image, wouldn't you have one > in-core repository object for the top-level superproject, and one > for each submodule? If a submodule commit bound to a path in the > superproject's tree is a viertual "pretend" commit object or if it > has a virtual "pretend" tree object, don't you need to expose these > to both submodule and superproject repositories, if your application > wants to seamlessly cross the module boundary (think "git grep > --recurse-submodules" or something)? > > For now, as long as the_repository is being used as that "execution > context object", and not a repository instance passed along the call > chain, then the globalness of these virtual objects is maintained, > so this change will not cause breakage (e.g., such an application > may want to pick up the virtual object from the repository instance > for the superproject and it may find it, but when traversing down to > a submdoule, the same virtual object may not be found in the > repository instance for the submodule it descended into and working > in, if you make it per repository and pass repository instance > around along the call chain). But eventually somebody will start > saying "let's remove USE_THE_REPOSITORY_VARIABLE", at which point I > am not sure how subtle such a bug would become. I think the answer is very much "it depends". I can think of usecases where it might be the right to pretend objects to exist globally, but there's also usecases where I think it makes sense to treat them as repository-specific. The thing is: we can do the former if the virtual objects are specific to a repository, but we can't do the latter if the virtual objects are global. As far as I can see we only use this mechanism in git-blame(1) right now to create a fake working tree commit. This mechanism does not cross into submodules at all, and if it would I think we would want to create two separate fake working tree commits anyway: one for the parent repository, and one for each submodule. So converting this mechanism to be local to the repository (or rather local to an object store) feels like the right thing to do to me. But I agree with you in principle: we will have to be a lot more mindful going forward as it comes to handling multiple repositories in-memory. We don't do this well right now, but as we convert more and more code so that it doesn't use `the_repository` anymore we'll have to become better at this indeed. From my perspective that isn't only true for these fake working tree commits, but it's a general thing that we'll have to sort out over time. It's inherent to the whole libifcation process. I think for the most part we're fine right now, as we don't make use of any of the new capabilities that libifcation brings with it in theory. But once usecases start to come up that _do_ make use of this we will have to think about those issues a whole lot more carefully. Patrick