Poor performance using reftable with many refs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I've been doing some testing of reftable at $DAYJOB and I found an
interesting performance problem when creating many refs.

I've attached a script which takes 50,000 recent commits, creates a file
suitable for `git update-ref --stdin`, deletes all of the existing refs,
and then uses that file to create the 50,000 refs.  The ref creation is
timed using Linux's `/usr/bin/time`.  (This is partially extracted from
a larger script, so please accept my apologies for some untidiness.)

With the files backend, the output is as below:

  1.75user 3.73system 0:05.50elapsed 99%CPU (0avgtext+0avgdata 166344maxresident)k
  0inputs+442880outputs (0major+27962minor)pagefaults 0swaps

With the reftable backend, this is the output:

  56.91user 0.52system 0:57.44elapsed 99%CPU (0avgtext+0avgdata 160416maxresident)k
  0inputs+6784outputs (0major+26151minor)pagefaults 0swaps

Both measurements are on next, so they should have all relevant patches
that I'm aware of.  I've tested on two X1 Carbons, one with Ubuntu 24.04
and one with Debian unstable, so they're both reasonably beefy machines
with modern Linux OSes.

It takes about 30 times as long to perform using the reftable backend,
which is concerning.  While this is a synthetic measurement, I had
intended to use it to determine the performance characteristics of
the reference update portion when pushing a large repository for the
first time.

I admit I haven't done any other particular investigation as to what's
going wrong here, but the behaviour is very noticeable so it may be easy
to profile.

One note: the script will be faster and more useful to reproduce if you
change the repository source to a local clone of the Linux repo.

----
#!/bin/sh -e
# This script will reproduce a performance problem with many (50000) refs using
# the current version of reftable in next.  The directory `testcase` under the
# current directory will be removed and replaced.
#
# Once the script is finished, you can do `cat testcase/tracedir/*/re-creation`
# to see the performance characteristics of the files backend (first) and the
# reftable backend (second).

# Your friendly neighbourhood Linux repository.  This may be any valid remote,
# including an HTTPS or SSH URL.
REPO_SRC="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git";
TAG="v6.13"

export GIT_CONFIG_GLOBAL=/dev/null

timed_op () {
  local output="$1"
  local message="$2"
  shift
  shift
  printf '%s...' "$message" >&2
  /usr/bin/time -o "$TRACEDIR/$output" "$@"
  printf 'done.\n' >&2
}

delete_refs () {
  local output="$1"
  (
    echo "start"
    git for-each-ref --format="%(refname)" | sed -e 's/^/delete /'
    echo "prepare"
    echo "commit"
  ) | timed_op "$output" "Deleting all references" git update-ref --stdin
}

fake_refs=true
while [ $# -gt 0 ]
do
  case "$1" in
    --real-refs)
      fake_refs=false
      shift
      ;;
    *)
      break
      ;;
  esac
done

rm -fr testcase
mkdir testcase
cd testcase
git clone --bare "$REPO_SRC" repo

mkdir tracedir

for backend in files reftable
do
  git clone --bare repo $backend
  (
    set -e
    cd $backend
    TRACEDIR="$(realpath "../tracedir/$backend")"
    mkdir -p "$TRACEDIR"

    if [ "$backend" = reftable ]
    then
      timed_op "migration" "Migrating to reftable" git refs migrate --ref-format=reftable
    fi

    if $fake_refs
    then
      git rev-list "$TAG" | head -n 50000 | perl -pe '
        $count++;
        $choice = $count % 4;
        if ($choice == 0) {
          s!^(.*)$!create refs/heads/ref-$count $1!;
        } elsif ($choice == 1) {
          s!^(.*)$!create refs/remotes/bk2204/ref-$count $1!;
        } elsif ($choice == 2) {
          s!^(.*)$!create refs/remotes/origin/ref-$count $1!;
        } elsif ($choice == 3) {
          s!^(.*)$!create refs/tags/tag-$count $1!;
        }
      ' | sort >all-refs
    else
      git for-each-ref --format="%(refname) %(objectname)" | sed -e 's/^/create /' >all-refs
    fi
    delete_refs "deletion"
    timed_op "re-creation" "Re-creating refs" git update-ref --stdin <all-refs
  )
done
----
-- 
brian m. carlson (they/them or he/him)
Toronto, Ontario, CA

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux