On 25/07/02 12:14PM, Patrick Steinhardt wrote: > The "reftable" format has come a long way and has matured nicely since > it has been merged into git via 57db2a094d5 (refs: introduce reftable > backend, 2024-02-07). It fixes longstanding issues that cannot be fixed > with the "files" format in a backwards-compatible way and performs > significantly better in many use cases. > > Announce that we will switch to the "reftable" format in Git 3.0 for > newly created repositories. > > This switch is dependent on support in the larger Git ecosystem. Most > importantly, libraries like JGit, libgit2 and Gitoxide should support > the reftable backend so that we don't break all applications and tools > built on top of those libraries. > > Signed-off-by: Patrick Steinhardt <ps@xxxxxx> > --- > Documentation/BreakingChanges.adoc | 39 ++++++++++++++++++++++++++++++++++++++ > setup.c | 6 ++++++ > t/t0001-init.sh | 16 ++++++++++++++++ > 3 files changed, 61 insertions(+) > > diff --git a/Documentation/BreakingChanges.adoc b/Documentation/BreakingChanges.adoc > index c6bd94986c5..c96b5319cdd 100644 > --- a/Documentation/BreakingChanges.adoc > +++ b/Documentation/BreakingChanges.adoc > @@ -118,6 +118,45 @@ Cf. <2f5de416-04ba-c23d-1e0b-83bb655829a7@xxxxxxxxxxx>, > <20170223155046.e7nxivfwqqoprsqj@LykOS.localdomain>, > <CA+EOSBncr=4a4d8n9xS4FNehyebpmX8JiUwCsXD47EQDE+DiUQ@xxxxxxxxxxxxxx>. > > +* The default storage format for references in newly created repositories will > + be changed from "files" to "reftable". The "reftable" format provides > + multiple advantages over the "files" format: > ++ > + ** It is impossible to store two references that only differ in casing on > + case-insensitive filesystems with the "files" format. This issue is > + especially common on Windows, but also on older versions of macOS. As the > + "reftable" backend does not use filesystem paths anymore to encode > + reference names this problem goes away. I believe even modern macOS by default uses a case-insensitive file-system. Maybe we should instead say: This limitation is common on Windows and macOS platforms. > + ** Similarly, macOS normalizes path names that contain unicode characters, > + which has the consequence that you cannot store two names with unicode > + characters that are encoded differently with the "files" backend. Again, > + this is not an issue with the "reftable" backend. > + ** Deleting references with the "files" backend requires Git to rewrite the > + complete "packed-refs" file. In large repositories with many references > + this file can easily be dozens of megabytes in size, in extreme cases it > + may be gigabytes. The "reftable" backend uses tombstone markers for > + deleted references and thus does not have to rewrite all of its data. > + ** Repository housekeeping with the "files" backend typically performs > + all-into-one repacks of references. This can be quite expensive, and > + consequently housekeeping is a tradeoff between the number of loose > + references that accumulate and slow down operations that read references, > + and compressing those loose references into the "packed-refs" file. The > + "reftable" backend uses geometric compaction after every write, which > + amortizes costs and ensures that the backend is always in a > + well-maintained state. > + ** Operations that write multiple references at once are not atomic with the > + "files" backend. Consequently, Git may see in-between states when it reads > + references while a reference transaction is in the process of being > + committed to disk. > + ** Writing many references at once is slow with the "files" backend because > + every reference is created as a separate file. The "reftable" backend > + significantly outperforms the "files" backend by multiple orders of > + magnitude. The examples above do a good job at explaining individual technical benefits. I do wonder if we should include a more general statement aimed at users as to why the change to reftables is beneficial. Maybe something like: The reftables backend addresses several performance concerns as the number of references scale in a repository. > ++ > +A prerequisite for this change is that the ecosystem is ready to support the > +"reftable" format. Most importantly, alternative implementations of Git like > +JGit, libgit2 and Gitoxide need to support it. > + > === Removals > > * Support for grafting commits has long been superseded by git-replace(1). > diff --git a/setup.c b/setup.c > index f93bd6a24a5..3ab0f11fbfd 100644 > --- a/setup.c > +++ b/setup.c > @@ -2541,6 +2541,12 @@ static void repository_format_configure(struct repository_format *repo_fmt, > repo_fmt->ref_storage_format = ref_format; > } else if (cfg.ref_format != REF_STORAGE_FORMAT_UNKNOWN) { > repo_fmt->ref_storage_format = cfg.ref_format; > + } else { > +#ifdef WITH_BREAKING_CHANGES > + repo_fmt->ref_storage_format = REF_STORAGE_FORMAT_REFTABLE; > +#else > + repo_fmt->ref_storage_format = REF_STORAGE_FORMAT_FILES; > +#endif Ok so now when we build with `WITH_BREAKING_CHANGES` the default reference format is changed to reftables. > } > repo_set_ref_storage_format(the_repository, repo_fmt->ref_storage_format); > } > diff --git a/t/t0001-init.sh b/t/t0001-init.sh > index f11a40811f2..e0f27484192 100755 > --- a/t/t0001-init.sh > +++ b/t/t0001-init.sh > @@ -658,6 +658,22 @@ test_expect_success 'init warns about invalid init.defaultRefFormat' ' > test_cmp expected actual > ' > > +test_expect_success 'default ref format' ' > + test_when_finished "rm -rf refformat" && > + ( > + sane_unset GIT_DEFAULT_REF_FORMAT && > + git init refformat > + ) && > + if test_have_prereq WITH_BREAKING_CHANGES > + then > + echo reftable >expect > + else > + echo files >expect > + fi && > + git -C refformat rev-parse --show-ref-format >actual && > + test_cmp expect actual > +' And here add a test to verify this change. Looks good :) -Justin