I've been doing some testing of reftable at $DAYJOB and I found an interesting performance problem when creating many refs. I've attached a script which takes 50,000 recent commits, creates a file suitable for `git update-ref --stdin`, deletes all of the existing refs, and then uses that file to create the 50,000 refs. The ref creation is timed using Linux's `/usr/bin/time`. (This is partially extracted from a larger script, so please accept my apologies for some untidiness.) With the files backend, the output is as below: 1.75user 3.73system 0:05.50elapsed 99%CPU (0avgtext+0avgdata 166344maxresident)k 0inputs+442880outputs (0major+27962minor)pagefaults 0swaps With the reftable backend, this is the output: 56.91user 0.52system 0:57.44elapsed 99%CPU (0avgtext+0avgdata 160416maxresident)k 0inputs+6784outputs (0major+26151minor)pagefaults 0swaps Both measurements are on next, so they should have all relevant patches that I'm aware of. I've tested on two X1 Carbons, one with Ubuntu 24.04 and one with Debian unstable, so they're both reasonably beefy machines with modern Linux OSes. It takes about 30 times as long to perform using the reftable backend, which is concerning. While this is a synthetic measurement, I had intended to use it to determine the performance characteristics of the reference update portion when pushing a large repository for the first time. I admit I haven't done any other particular investigation as to what's going wrong here, but the behaviour is very noticeable so it may be easy to profile. One note: the script will be faster and more useful to reproduce if you change the repository source to a local clone of the Linux repo. ---- #!/bin/sh -e # This script will reproduce a performance problem with many (50000) refs using # the current version of reftable in next. The directory `testcase` under the # current directory will be removed and replaced. # # Once the script is finished, you can do `cat testcase/tracedir/*/re-creation` # to see the performance characteristics of the files backend (first) and the # reftable backend (second). # Your friendly neighbourhood Linux repository. This may be any valid remote, # including an HTTPS or SSH URL. REPO_SRC="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git" TAG="v6.13" export GIT_CONFIG_GLOBAL=/dev/null timed_op () { local output="$1" local message="$2" shift shift printf '%s...' "$message" >&2 /usr/bin/time -o "$TRACEDIR/$output" "$@" printf 'done.\n' >&2 } delete_refs () { local output="$1" ( echo "start" git for-each-ref --format="%(refname)" | sed -e 's/^/delete /' echo "prepare" echo "commit" ) | timed_op "$output" "Deleting all references" git update-ref --stdin } fake_refs=true while [ $# -gt 0 ] do case "$1" in --real-refs) fake_refs=false shift ;; *) break ;; esac done rm -fr testcase mkdir testcase cd testcase git clone --bare "$REPO_SRC" repo mkdir tracedir for backend in files reftable do git clone --bare repo $backend ( set -e cd $backend TRACEDIR="$(realpath "../tracedir/$backend")" mkdir -p "$TRACEDIR" if [ "$backend" = reftable ] then timed_op "migration" "Migrating to reftable" git refs migrate --ref-format=reftable fi if $fake_refs then git rev-list "$TAG" | head -n 50000 | perl -pe ' $count++; $choice = $count % 4; if ($choice == 0) { s!^(.*)$!create refs/heads/ref-$count $1!; } elsif ($choice == 1) { s!^(.*)$!create refs/remotes/bk2204/ref-$count $1!; } elsif ($choice == 2) { s!^(.*)$!create refs/remotes/origin/ref-$count $1!; } elsif ($choice == 3) { s!^(.*)$!create refs/tags/tag-$count $1!; } ' | sort >all-refs else git for-each-ref --format="%(refname) %(objectname)" | sed -e 's/^/create /' >all-refs fi delete_refs "deletion" timed_op "re-creation" "Re-creating refs" git update-ref --stdin <all-refs ) done ---- -- brian m. carlson (they/them or he/him) Toronto, Ontario, CA
Attachment:
signature.asc
Description: PGP signature