Re: [PATCH v2 1/2] BreakingChanges: announce switch to "reftable" format

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Patrick Steinhardt <ps@xxxxxx> writes:

> The "reftable" format has come a long way and has matured nicely since
> it has been merged into git via 57db2a094d5 (refs: introduce reftable
> backend, 2024-02-07). It fixes longstanding issues that cannot be fixed
> with the "files" format in a backwards-compatible way and performs
> significantly better in many use cases.
>
> Announce that we will switch to the "reftable" format in Git 3.0 for
> newly created repositories.
>

Nit: This commit does more than announce the switch. It also adds in the
changes to use reftable when WITH_BREAKING_CHANGES is set. Would be nice
to call that out here.

> This switch is dependent on support in the larger Git ecosystem. Most
> importantly, libraries like JGit, libgit2 and Gitoxide should support
> the reftable backend so that we don't break all applications and tools
> built on top of those libraries.
>
> Signed-off-by: Patrick Steinhardt <ps@xxxxxx>
> ---
>  Documentation/BreakingChanges.adoc | 44 ++++++++++++++++++++++++++++++++++++++
>  help.c                             |  2 ++
>  repository.h                       |  6 ++++++
>  setup.c                            |  2 ++
>  t/t0001-init.sh                    | 11 ++++++++++
>  5 files changed, 65 insertions(+)
>
> diff --git a/Documentation/BreakingChanges.adoc b/Documentation/BreakingChanges.adoc
> index c6bd94986c5..614debcd740 100644
> --- a/Documentation/BreakingChanges.adoc
> +++ b/Documentation/BreakingChanges.adoc
> @@ -118,6 +118,50 @@ Cf. <2f5de416-04ba-c23d-1e0b-83bb655829a7@xxxxxxxxxxx>,
>  <20170223155046.e7nxivfwqqoprsqj@LykOS.localdomain>,
>  <CA+EOSBncr=4a4d8n9xS4FNehyebpmX8JiUwCsXD47EQDE+DiUQ@xxxxxxxxxxxxxx>.
>
> +* The default storage format for references in newly created repositories will
> +  be changed from "files" to "reftable". The "reftable" format provides
> +  multiple advantages over the "files" format:
> ++
> +  ** It is impossible to store two references that only differ in casing on
> +     case-insensitive filesystems with the "files" format. This issue is common
> +     on Windows and macOS platforms. As the "reftable" backend does not use
> +     filesystem paths anymore to encode reference names this problem goes away.

Nit: s/anymore// makes it clearer, since reftable never used filesystem
path.

> +  ** Similarly, macOS normalizes path names that contain unicode characters,
> +     which has the consequence that you cannot store two names with unicode
> +     characters that are encoded differently with the "files" backend. Again,
> +     this is not an issue with the "reftable" backend.
> +  ** Deleting references with the "files" backend requires Git to rewrite the
> +     complete "packed-refs" file. In large repositories with many references
> +     this file can easily be dozens of megabytes in size, in extreme cases it
> +     may be gigabytes. The "reftable" backend uses tombstone markers for
> +     deleted references and thus does not have to rewrite all of its data.
> +  ** Repository housekeeping with the "files" backend typically performs
> +     all-into-one repacks of references. This can be quite expensive, and
> +     consequently housekeeping is a tradeoff between the number of loose
> +     references that accumulate and slow down operations that read references,
> +     and compressing those loose references into the "packed-refs" file. The
> +     "reftable" backend uses geometric compaction after every write, which
> +     amortizes costs and ensures that the backend is always in a
> +     well-maintained state.
> +  ** Operations that write multiple references at once are not atomic with the
> +     "files" backend. Consequently, Git may see in-between states when it reads
> +     references while a reference transaction is in the process of being
> +     committed to disk.
> +  ** Writing many references at once is slow with the "files" backend because
> +     every reference is created as a separate file. The "reftable" backend
> +     significantly outperforms the "files" backend by multiple orders of
> +     magnitude.

Perhaps something about how reftable uses a binary format and could save
storage space.

> ++
> +Users that get immediate benefit from the "reftable" backend could continue to
> +opt-in to the "reftable" format manually by setting the "init.defaultRefFormat"
> +config. But defaults matter, and we think that overall users will have a better
> +experience with less platform-specific quirks when they use the new backend by
> +default.
> ++
> +A prerequisite for this change is that the ecosystem is ready to support the
> +"reftable" format. Most importantly, alternative implementations of Git like
> +JGit, libgit2 and Gitoxide need to support it.
> +
>  === Removals
>
>  * Support for grafting commits has long been superseded by git-replace(1).
> diff --git a/help.c b/help.c
> index 21b778707a6..89cd47e3b86 100644
> --- a/help.c
> +++ b/help.c
> @@ -810,6 +810,8 @@ void get_version_info(struct strbuf *buf, int show_build_options)
>  			    SHA1_UNSAFE_BACKEND);
>  #endif
>  		strbuf_addf(buf, "SHA-256: %s\n", SHA256_BACKEND);
> +		strbuf_addf(buf, "default-ref-format: %s\n",
> +			    ref_storage_format_to_name(REF_STORAGE_FORMAT_DEFAULT));
>  	}
>  }
>
> diff --git a/repository.h b/repository.h
> index c4c92b2ab9c..77c4189d5dc 100644
> --- a/repository.h
> +++ b/repository.h
> @@ -20,6 +20,12 @@ enum ref_storage_format {
>  	REF_STORAGE_FORMAT_REFTABLE,
>  };
>
> +#ifdef WITH_BREAKING_CHANGES /* Git 3.0 */
> +# define REF_STORAGE_FORMAT_DEFAULT REF_STORAGE_FORMAT_REFTABLE
> +#else
> +# define REF_STORAGE_FORMAT_DEFAULT REF_STORAGE_FORMAT_FILES
> +#endif
> +

Okay this makes sense.

>  struct repo_path_cache {
>  	char *squash_msg;
>  	char *merge_msg;
> diff --git a/setup.c b/setup.c
> index f93bd6a24a5..f0c06c655a9 100644
> --- a/setup.c
> +++ b/setup.c
> @@ -2541,6 +2541,8 @@ static void repository_format_configure(struct repository_format *repo_fmt,
>  			repo_fmt->ref_storage_format = ref_format;
>  	} else if (cfg.ref_format != REF_STORAGE_FORMAT_UNKNOWN) {
>  		repo_fmt->ref_storage_format = cfg.ref_format;
> +	} else {
> +		repo_fmt->ref_storage_format = REF_STORAGE_FORMAT_DEFAULT;
>  	}
>  	repo_set_ref_storage_format(the_repository, repo_fmt->ref_storage_format);
>  }

Shouldn't this change be instead made to REPOSITORY_FORMAT_INIT?

diff --git a/setup.h b/setup.h
index 18dc3b7368..c1b765043f 100644
--- a/setup.h
+++ b/setup.h
@@ -150,7 +150,7 @@ struct repository_format {
 	.version = -1, \
 	.is_bare = -1, \
 	.hash_algo = GIT_HASH_SHA1, \
-	.ref_storage_format = REF_STORAGE_FORMAT_FILES, \
+	.ref_storage_format = REF_STORAGE_FORMAT_DEFAULT, \
 	.unknown_extensions = STRING_LIST_INIT_DUP, \
 	.v1_only_extensions = STRING_LIST_INIT_DUP, \
 }

> diff --git a/t/t0001-init.sh b/t/t0001-init.sh
> index f11a40811f2..186664162fc 100755
> --- a/t/t0001-init.sh
> +++ b/t/t0001-init.sh
> @@ -658,6 +658,17 @@ test_expect_success 'init warns about invalid init.defaultRefFormat' '
>  	test_cmp expected actual
>  '
>
> +test_expect_success 'default ref format' '
> +	test_when_finished "rm -rf refformat" &&
> +	(
> +		sane_unset GIT_DEFAULT_REF_FORMAT &&
> +		git init refformat
> +	) &&
> +	git version --build-options | sed -ne "s/^default-ref-format: //p" >expect &&
> +	git -C refformat rev-parse --show-ref-format >actual &&
> +	test_cmp expect actual
> +'
> +
>  backends="files reftable"
>  for format in $backends
>  do
>
> --
> 2.50.0.195.g74e6fc65d0.dirty

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux