On Mon, Apr 14, 2025 at 5:31 AM Kaartic Sivaraam <kaartic.sivaraam@xxxxxxxxx> wrote: > > Hello all, > > As part of the Git's 20th year anniversary, we from the Git Rev News > team are thinking of doing a community interview where we would share a > list of questions that we've prepared and we would like to welcome > answers from anyone in the community for them. We could gather the > answers for them upto a particular time (like 25/April or so) and begin > curating the answers into a special interview for this month's edition. > The questions are below. Feel free to respond with your answers to this > mail thread. Let me know if I've missed to include any particularly > compelling question. > > - What's your favorite Git trick or workflow that you wish more people > knew about? range-diff. The ideas behind it ought to be the basis for code review, IMO. Commits should be the unit of review (including commit messages as a fundamental and primary thing to be reviewed), and a series of commits should be the unit of merging. I dislike most code review tools, because they get one or both of those things wrong. Getting both of those things right naturally leads to range-diff or something like it being a very important part of the workflow, at a minimum for detecting which commits in a series are unmodified and which have been updated and need to be further reviewed. > - What was your worst Git disaster, and how did you recover from it? My worst Git-related disaster wasn't with Git directly but with our Git hosting software we used at a prior job, Gerrit. 'twas a "startup" that was still forming good practices. We had both a production and a staging instance. The staging instance was seeded with a copy of production data so we could do scale testing...but that seeding process was a multi-step manual thing; it hadn't been automated. One step was, as best I recall, "drop database gerrit", followed by loading the production copy of the mysql database (this was long before NoteDB arrived). And as many readers probably have guessed by now, I was on the wrong host one day when I ran that command. The actual git repositories were still intact, but the review metadata was toast. Luckily, we had a backup from about 7 hours earlier, so we could recover the older review metadata and with some hackery fix the mysql metadata mismatch with the newer repository contents. And since Gerrit emailed folks comments from reviews as they were posted, we could tell people to look at their emails for the pieces we couldn't recover. It was a really long night trying to fix things. Some folks told me they thought I was going to throw up just looking at me. But I learned how wonderful it was to be at a company with blameless post-mortems, and I appreciated the many folks who reached out to tell me stories of mistakes they had made. They were more interested in whether we learned our lesson and put processes into place to prevent repeats, and I definitely did both. I did, of course, also get some good-natured ribbing, such as people saying I got to play the part of little Bobby Tables once (see https://xkcd.com/327/ if you don't know that reference). I kindly reminded them that I didn't drop a table -- I dropped the whole database (plus, it wasn't injection, it was just running a command in the wrong location) . Also, one of my colleagues helpfully modified the prompt on production to be red and bold, "This is PROD Gerrit", and the prompt on staging to be green, "This is staging Gerrit; it's okay to drop database here!" The prompts ended up not mattering since I automated the process, and made sure the process just error'ed out if run on prod instead of staging. But the prompt persisted for many years anyway, because I thought it was a hilarious way to poke fun at my blunder. > - If you could go back in time and change one design decision in Git, > what would it be? The index. For a few reasons. 1) Performance. 1a) The index is pervasive throughout the codebase, and while it works great for small repositories, it means that many operations are O(size of repository) instead of O(size of changes). sparse indices help, but the code has to be carefully audited for sparse indices to work with each codepath, and even then there tends to be a fallback of just-load-everything-anyway because the data structure doesn't lend nicely to just expanding a little more. 1b) An under-appreciated aspect of the performance improvements that came from our new merge strategy, merge-ort, were due to dispensing with the index as the primary data structure. The index had two problems: 1b-1) first of all it meant loading every path in the repository, which would have prevented ort's optimization to avoid recursing into subtrees when unnecessary (an optimization that often made merges e.g. 50x faster). Sparse indices didn't exist back then, but even if they had we would have had to complicate them significantly in order to have their sparseness be determined by renames and the intersection of modified paths on the two sides of history instead of having sparseness determined by user-defined path rules; I think that'd have been much more complicated than just dispensing with the index as the data structure, but we didn't even have sparse indices back then anyway. 1b-2) Second, the use of the index as done in the old merge strategy, merge-recursive, resulted in O(N^2) behavior since entries (including conflicted higher order stages) had to be inserted in sorted order. Deleting entries didn't have the same O(N^2) problem due to some tricks to queue the deletion for later, but attempting to do the same for insertions was far from straightforward and I believe would have required making some other data structure primary and then forming the index at the end. (Note that the primary data structure used, whatever it is, cannot just have a list of things to insert, it also needs to be checked for various properties intermingled with insertions...and those sometimes relied on the fact that the index was sorted for quick lookups.) (Note that a tree-structured index rather than a linear index would resolve these problems. But retrofitting the entire codebase is probably never going to happen...) 2) Cognitive Complexity. The funny thing is, although I say this, I use the index all the time. I use `git add -p` a lot. I very much need to slice and dice my changes into different commits, and tend to have dirty changes that I don't want pushed. But slicing and dicing before things are committed, as opposed to being able to slice and dice after, is a choice that adds a lot of complexity to the user interface and does so even for users who aren't interested in slicing and dicing commits. We don't have a sufficiently flexible set of tooling for slicing and dicing commits after-the-fact within git to switch to a post-commit-slice-and-dice workflow even today, but I suspect that some of the ideas from JJ would or could be much better than the methods I use today in git to slice and dice commits. > - Which Git feature or improvement over the past 20 years do you think > had the biggest impact on your workflow? Speed. Being able to instantly switch branches (in smaller repos, sure, but CVS and SVN couldn't pull it off even in small repos) was a game changer. > - What Git problem that existed 10 years ago has been most > successfully solved? Merging and rebasing with lots of renames (and generally merging without a worktree or index). I'm obviously a bit biased on this point, but that doesn't mean I'm wrong. ;-) It used to be awful and works great now. Relatedly, merging without a worktree or index was problematic; you had to either use an alternative merge strategy with limited capabilities, or use something other than git (e.g. libgit2). But now git handles it well with its default merge strategy. > - Which Git commands or workflows do you think are still misunderstood > or underutilized today? range-diff is very under-utilized, but I already discussed that above. > - What's one Git based project, tool, or extension you think deserves > more recognition from the community? > > - What Git feature or capability surprised you most when you first > discovered it? > > - What's your boldest prediction about how version control might look > in another 20 years? I'm more interested in what storms might be brewing along that path, and what we might be able to do to avoid them. In particular, some questions and observations in that area: * With monorepos growing ever larger, do we have hard-to-workaround-or-fix design decisions that pose scaling challenges? e.g. * the index data structure * per-directory .gitignore files, per-directory .gitattribute files, etc. * ...or do the prominent Git forges have hard-to-workaround-or-fix design decisions that'll give Git a reputation for not scaling? e.g. * making refs/pull/NNN/merge a public ref and excessively implicitly updating it * Will we face a crisis of interest? e.g. * git is currently written in C. Even if that's not a liability already, coupled with "decades" I think it is. Young developers probably don't want to learn C, and older ones who already know C may worry about C becoming a Fortran or Cobol. * Companies employing git developers think "git already won" and redeploy those engineers on other problems * Will the combination of issues above result in folks who want improvements deciding their best bet is not improving git but in creating/funding an alternative? Will that snowball? To me, the entry of new projects like jj and sapling suggest the above are real concerns already rather than just theoretical. Both projects have compelling things that git lacks. I like the friendly competition, and the jj and sapling developers are awesome to talk to at Git Merge conferences. But there is a risk that this friendly competition mirrors that of Git and Mercurial from years past, and that Git at some future point down the road ends up on the other side of that history and gets largely displaced by the alternatives. I'd rather not see that happen, but I sometimes wonder if we're taking enough measures to avoid marching towards such an outcome.