On 2025-06-20 at 14:55:12, Junio C Hamano wrote: > "brian m. carlson" <sandals@xxxxxxxxxxxxxxxxxxxx> writes: > > > When we define a new repository format with REPOSITORY_FORMAT_INIT, we > > always use GIT_HASH_SHA1, and this value ends up getting used as the > > default value to initialize a repository if none of the command line, > > environment, or config tell us to do otherwise. > > > > Because we might not always want to use SHA-1 as the default, let's > > instead specify the default hash algorithm constant so that we will use > > whatever the specified default is. > > All of the above hints the use of _DEFAULT instead of _SHA1 but ... > > > However, to make sure we continue to > > read repositories without a specified hash algorithm as SHA-1, default > > the repository format to the original hash algorithm (SHA-1) when > > reading the repository format. > > ... this explains why we may want to > > - expect that we would be able to read the hash from repository > - read from repository > - if the repository specifies the hash, happily use it > - otherwise it is a lagacy repository so use the SHA-1 > > Is that what is going on here? Because I find some things that > happens in the code somewhat questionable. Yes, that's roughly it. I'll explain this better in v2. > > Signed-off-by: brian m. carlson <sandals@xxxxxxxxxxxxxxxxxxxx> > > --- > > setup.c | 5 ++++- > > setup.h | 2 +- > > 2 files changed, 5 insertions(+), 2 deletions(-) > > > > diff --git a/setup.c b/setup.c > > index 641c857ed5..fb38824a2b 100644 > > --- a/setup.c > > +++ b/setup.c > > @@ -835,9 +835,12 @@ static void init_repository_format(struct repository_format *format) > > int read_repository_format(struct repository_format *format, const char *path) > > { > > clear_repository_format(format); > > + format->hash_algo = GIT_HASH_ORIGINAL; > > If we expect we can read from the config, and otherwise we fall back > to the hardcoded legacy SHA-1, do we need this assignment? We > cleared and version is set to -1, and then we read from the config ... > > > git_config_from_file(check_repo_format, path, format); > > ... so if the file said anything about "extensions.objectformat", we > would know it by now. If not, wouldn't version be left to -1 as our > previous clal to clear_repository_format() set it via its call to > init_repository_format()? The subtle behaviour here is that -1 means that either there is no version specified or that there's no config file. I was surprised to learn that we do not require a configuration file, but we have tests for that case with `git branch`. > Ahh, this code is prepared to handle a repository that claims to use > version 1 but does not set extensions.objectformat at all. And in > such a case, we do want to use SHA-1. OK, the above code makes > sense for that case. Correct. We can set v1 because we want reftable, for instance, and we never set extensions.objectFormat to "sha1". We always rely on the default behaviour since that's more compatible. > > - if (format->version == -1) > > + if (format->version == -1) { > > And if there is no core.repositoryformatversion set, we will come > here. According to the comment before handle_extension_v0(), some > extensions.* should still be honored even in such a repository, and > the above call to git_config_from_file() should have handled them > just fine. > > However, I do not understand why we clear all of what we read with > another call to clear_repository_format() here. Because this is the case where there's no config file. If nobody bothered to write a configuration file, then we want to reset everything to the default. I don't know what we do if we have a repository with a config file and no version, but literally every repository since Git 0.99.3 (I believe) has core.repositoryformatversion written into the repo. I'm certain that the behaviour we'd want if nobody specified one was to do the most compatible thing, so the defaults seem prudent. > Admittedly, I find that the way how check_repo_format() does its > thing somewhat questionable. Even though it reads into the .version > member the value of core.repositoryformatversion, it slurps in all > the extensions.* regardless of what .version the repository claims > to be in. So even though there are two separate functions to handle > "historical compatibility" handle_extension_v0() and other extensions, > we still end up honoring extensions.objectformat in a repository that > does not say what format version it uses. And clearing them with a > call to clear_repository_format() may make sense, but do we want to > clear things we read with handle_extension_v0() as well? No, it looks like that at first, but I don't think that's correct. `verify_repository_format` complains if we have any v1-only extensions (such as extensions.objectFormat) and we have version 0. That's also where we check whether there are unknown extensions in v1, since we must refuse to read the repository in that case. There are tests for this case: I added some that v0 with extensions.objectFormat is rejected when I did the SHA-256 work, and I think Peff and Patrick may have done some additional cleanup here (and maybe others; my apologies if I've forgotten anyone). The reason we need to read all the extensions is that different config options aren't ordered and the config callback processes items we see in the order they're in in the config file. We might first have an extensions block and only then a core block, so when reading the extensions we don't know yet what the repository version is. -- brian m. carlson (they/them) Toronto, Ontario, CA
Attachment:
signature.asc
Description: PGP signature