On Wed, Jul 02, 2025 at 12:17:50PM -0500, Justin Tobler wrote: > On 25/07/02 12:14PM, Patrick Steinhardt wrote: > > diff --git a/Documentation/BreakingChanges.adoc b/Documentation/BreakingChanges.adoc > > index c6bd94986c5..c96b5319cdd 100644 > > --- a/Documentation/BreakingChanges.adoc > > +++ b/Documentation/BreakingChanges.adoc > > @@ -118,6 +118,45 @@ Cf. <2f5de416-04ba-c23d-1e0b-83bb655829a7@xxxxxxxxxxx>, > > <20170223155046.e7nxivfwqqoprsqj@LykOS.localdomain>, > > <CA+EOSBncr=4a4d8n9xS4FNehyebpmX8JiUwCsXD47EQDE+DiUQ@xxxxxxxxxxxxxx>. > > > > +* The default storage format for references in newly created repositories will > > + be changed from "files" to "reftable". The "reftable" format provides > > + multiple advantages over the "files" format: > > ++ > > + ** It is impossible to store two references that only differ in casing on > > + case-insensitive filesystems with the "files" format. This issue is > > + especially common on Windows, but also on older versions of macOS. As the > > + "reftable" backend does not use filesystem paths anymore to encode > > + reference names this problem goes away. > > I believe even modern macOS by default uses a case-insensitive > file-system. Maybe we should instead say: > > This limitation is common on Windows and macOS platforms. Okay, thanks for the clarification. I thought recent versions of macOS were case-sensitive by default. > > + ** Similarly, macOS normalizes path names that contain unicode characters, > > + which has the consequence that you cannot store two names with unicode > > + characters that are encoded differently with the "files" backend. Again, > > + this is not an issue with the "reftable" backend. > > + ** Deleting references with the "files" backend requires Git to rewrite the > > + complete "packed-refs" file. In large repositories with many references > > + this file can easily be dozens of megabytes in size, in extreme cases it > > + may be gigabytes. The "reftable" backend uses tombstone markers for > > + deleted references and thus does not have to rewrite all of its data. > > + ** Repository housekeeping with the "files" backend typically performs > > + all-into-one repacks of references. This can be quite expensive, and > > + consequently housekeeping is a tradeoff between the number of loose > > + references that accumulate and slow down operations that read references, > > + and compressing those loose references into the "packed-refs" file. The > > + "reftable" backend uses geometric compaction after every write, which > > + amortizes costs and ensures that the backend is always in a > > + well-maintained state. > > + ** Operations that write multiple references at once are not atomic with the > > + "files" backend. Consequently, Git may see in-between states when it reads > > + references while a reference transaction is in the process of being > > + committed to disk. > > + ** Writing many references at once is slow with the "files" backend because > > + every reference is created as a separate file. The "reftable" backend > > + significantly outperforms the "files" backend by multiple orders of > > + magnitude. > > The examples above do a good job at explaining individual technical > benefits. I do wonder if we should include a more general statement > aimed at users as to why the change to reftables is beneficial. Maybe > something like: > > The reftables backend addresses several performance concerns as the > number of references scale in a repository. I think this would be a bit too handwavy. I'd rather want to point out the specific cases where we know it to perform better. Patrick