Re: [PATCH 05/10] setup: use the default algorithm to initialize repo format

Junio C Hamano <gitster@xxxxxxxxx> · Fri, 20 Jun 2025 07:55:12 -0700

"brian m. carlson" <sandals@xxxxxxxxxxxxxxxxxxxx> writes:

> When we define a new repository format with REPOSITORY_FORMAT_INIT, we
> always use GIT_HASH_SHA1, and this value ends up getting used as the
> default value to initialize a repository if none of the command line,
> environment, or config tell us to do otherwise.
>
> Because we might not always want to use SHA-1 as the default, let's
> instead specify the default hash algorithm constant so that we will use
> whatever the specified default is.  

All of the above hints the use of _DEFAULT instead of _SHA1 but ...

> However, to make sure we continue to
> read repositories without a specified hash algorithm as SHA-1, default
> the repository format to the original hash algorithm (SHA-1) when
> reading the repository format.

... this explains why we may want to

 - expect that we would be able to read the hash from repository
 - read from repository
 - if the repository specifies the hash, happily use it
 - otherwise it is a lagacy repository so use the SHA-1

Is that what is going on here?  Because I find some things that
happens in the code somewhat questionable.

> Signed-off-by: brian m. carlson <sandals@xxxxxxxxxxxxxxxxxxxx>
> ---
>  setup.c | 5 ++++-
>  setup.h | 2 +-
>  2 files changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/setup.c b/setup.c
> index 641c857ed5..fb38824a2b 100644
> --- a/setup.c
> +++ b/setup.c
> @@ -835,9 +835,12 @@ static void init_repository_format(struct repository_format *format)
>  int read_repository_format(struct repository_format *format, const char *path)
>  {
>  	clear_repository_format(format);
> +	format->hash_algo = GIT_HASH_ORIGINAL;

If we expect we can read from the config, and otherwise we fall back
to the hardcoded legacy SHA-1, do we need this assignment?  We
cleared and version is set to -1, and then we read from the config ...

>  	git_config_from_file(check_repo_format, path, format);

... so if the file said anything about "extensions.objectformat", we
would know it by now.  If not, wouldn't version be left to -1 as our
previous clal to clear_repository_format() set it via its call to
init_repository_format()?

Ahh, this code is prepared to handle a repository that claims to use
version 1 but does not set extensions.objectformat at all.  And in
such a case, we do want to use SHA-1.  OK, the above code makes
sense for that case.

> -	if (format->version == -1)
> +	if (format->version == -1) {

And if there is no core.repositoryformatversion set, we will come
here.  According to the comment before handle_extension_v0(), some
extensions.* should still be honored even in such a repository, and
the above call to git_config_from_file() should have handled them
just fine.

However, I do not understand why we clear all of what we read with
another call to clear_repository_format() here.

>  		clear_repository_format(format);
> +		format->hash_algo = GIT_HASH_ORIGINAL;
> +	}
>  	return format->version;
>  }

Admittedly, I find that the way how check_repo_format() does its
thing somewhat questionable.  Even though it reads into the .version
member the value of core.repositoryformatversion, it slurps in all
the extensions.* regardless of what .version the repository claims
to be in.  So even though there are two separate functions to handle
"historical compatibility" handle_extension_v0() and other extensions,
we still end up honoring extensions.objectformat in a repository that
does not say what format version it uses.  And clearing them with a
call to clear_repository_format() may make sense, but do we want to
clear things we read with handle_extension_v0() as well?

> diff --git a/setup.h b/setup.h
> index 18dc3b7368..8522fa8575 100644
> --- a/setup.h
> +++ b/setup.h
> @@ -149,7 +149,7 @@ struct repository_format {
>  { \
>  	.version = -1, \
>  	.is_bare = -1, \
> -	.hash_algo = GIT_HASH_SHA1, \
> +	.hash_algo = GIT_HASH_DEFAULT, \
>  	.ref_storage_format = REF_STORAGE_FORMAT_FILES, \
>  	.unknown_extensions = STRING_LIST_INIT_DUP, \
>  	.v1_only_extensions = STRING_LIST_INIT_DUP, \