Re: [PATCH 1/2] BreakingChanges: announce switch to "reftable" format

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 25/07/02 12:14PM, Patrick Steinhardt wrote:
> The "reftable" format has come a long way and has matured nicely since
> it has been merged into git via 57db2a094d5 (refs: introduce reftable
> backend, 2024-02-07). It fixes longstanding issues that cannot be fixed
> with the "files" format in a backwards-compatible way and performs
> significantly better in many use cases.
> 
> Announce that we will switch to the "reftable" format in Git 3.0 for
> newly created repositories.
> 
> This switch is dependent on support in the larger Git ecosystem. Most
> importantly, libraries like JGit, libgit2 and Gitoxide should support
> the reftable backend so that we don't break all applications and tools
> built on top of those libraries.
> 
> Signed-off-by: Patrick Steinhardt <ps@xxxxxx>
> ---
>  Documentation/BreakingChanges.adoc | 39 ++++++++++++++++++++++++++++++++++++++
>  setup.c                            |  6 ++++++
>  t/t0001-init.sh                    | 16 ++++++++++++++++
>  3 files changed, 61 insertions(+)
> 
> diff --git a/Documentation/BreakingChanges.adoc b/Documentation/BreakingChanges.adoc
> index c6bd94986c5..c96b5319cdd 100644
> --- a/Documentation/BreakingChanges.adoc
> +++ b/Documentation/BreakingChanges.adoc
> @@ -118,6 +118,45 @@ Cf. <2f5de416-04ba-c23d-1e0b-83bb655829a7@xxxxxxxxxxx>,
>  <20170223155046.e7nxivfwqqoprsqj@LykOS.localdomain>,
>  <CA+EOSBncr=4a4d8n9xS4FNehyebpmX8JiUwCsXD47EQDE+DiUQ@xxxxxxxxxxxxxx>.
>  
> +* The default storage format for references in newly created repositories will
> +  be changed from "files" to "reftable". The "reftable" format provides
> +  multiple advantages over the "files" format:
> ++
> +  ** It is impossible to store two references that only differ in casing on
> +     case-insensitive filesystems with the "files" format. This issue is
> +     especially common on Windows, but also on older versions of macOS. As the
> +     "reftable" backend does not use filesystem paths anymore to encode
> +     reference names this problem goes away.

I believe even modern macOS by default uses a case-insensitive
file-system. Maybe we should instead say:

  This limitation is common on Windows and macOS platforms.

> +  ** Similarly, macOS normalizes path names that contain unicode characters,
> +     which has the consequence that you cannot store two names with unicode
> +     characters that are encoded differently with the "files" backend. Again,
> +     this is not an issue with the "reftable" backend.
> +  ** Deleting references with the "files" backend requires Git to rewrite the
> +     complete "packed-refs" file. In large repositories with many references
> +     this file can easily be dozens of megabytes in size, in extreme cases it
> +     may be gigabytes. The "reftable" backend uses tombstone markers for
> +     deleted references and thus does not have to rewrite all of its data.
> +  ** Repository housekeeping with the "files" backend typically performs
> +     all-into-one repacks of references. This can be quite expensive, and
> +     consequently housekeeping is a tradeoff between the number of loose
> +     references that accumulate and slow down operations that read references,
> +     and compressing those loose references into the "packed-refs" file. The
> +     "reftable" backend uses geometric compaction after every write, which
> +     amortizes costs and ensures that the backend is always in a
> +     well-maintained state.
> +  ** Operations that write multiple references at once are not atomic with the
> +     "files" backend. Consequently, Git may see in-between states when it reads
> +     references while a reference transaction is in the process of being
> +     committed to disk.
> +  ** Writing many references at once is slow with the "files" backend because
> +     every reference is created as a separate file. The "reftable" backend
> +     significantly outperforms the "files" backend by multiple orders of
> +     magnitude.

The examples above do a good job at explaining individual technical
benefits. I do wonder if we should include a more general statement
aimed at users as to why the change to reftables is beneficial. Maybe
something like:

  The reftables backend addresses several performance concerns as the
  number of references scale in a repository. 

> ++
> +A prerequisite for this change is that the ecosystem is ready to support the
> +"reftable" format. Most importantly, alternative implementations of Git like
> +JGit, libgit2 and Gitoxide need to support it.
> +
>  === Removals
>  
>  * Support for grafting commits has long been superseded by git-replace(1).
> diff --git a/setup.c b/setup.c
> index f93bd6a24a5..3ab0f11fbfd 100644
> --- a/setup.c
> +++ b/setup.c
> @@ -2541,6 +2541,12 @@ static void repository_format_configure(struct repository_format *repo_fmt,
>  			repo_fmt->ref_storage_format = ref_format;
>  	} else if (cfg.ref_format != REF_STORAGE_FORMAT_UNKNOWN) {
>  		repo_fmt->ref_storage_format = cfg.ref_format;
> +	} else {
> +#ifdef WITH_BREAKING_CHANGES
> +		repo_fmt->ref_storage_format = REF_STORAGE_FORMAT_REFTABLE;
> +#else
> +		repo_fmt->ref_storage_format = REF_STORAGE_FORMAT_FILES;
> +#endif

Ok so now when we build with `WITH_BREAKING_CHANGES` the default
reference format is changed to reftables.

>  	}
>  	repo_set_ref_storage_format(the_repository, repo_fmt->ref_storage_format);
>  }
> diff --git a/t/t0001-init.sh b/t/t0001-init.sh
> index f11a40811f2..e0f27484192 100755
> --- a/t/t0001-init.sh
> +++ b/t/t0001-init.sh
> @@ -658,6 +658,22 @@ test_expect_success 'init warns about invalid init.defaultRefFormat' '
>  	test_cmp expected actual
>  '
>  
> +test_expect_success 'default ref format' '
> +	test_when_finished "rm -rf refformat" &&
> +	(
> +		sane_unset GIT_DEFAULT_REF_FORMAT &&
> +		git init refformat
> +	) &&
> +	if test_have_prereq WITH_BREAKING_CHANGES
> +	then
> +		echo reftable >expect
> +	else
> +		echo files >expect
> +	fi &&
> +	git -C refformat rev-parse --show-ref-format >actual &&
> +	test_cmp expect actual
> +'

And here add a test to verify this change. Looks good :)

-Justin




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux