Re: [PATCH 3/4] builtin/remote: rework how remote refs get renamed

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jul 28, 2025 at 03:08:47PM +0200, Patrick Steinhardt wrote:

> The next-best thing is to do it in two transactions: one to delete all
> the references, and one to recreate the references and their reflogs.
> This signicantly speeds up the operation with the "files" backend. The
> following benchmark renames a remote with 10000 references:

Hmm. I was surprised to see so much reflog code here. It looks like
you're replaying the old reflog entry by entry. But the old code was
leaning on refs_rename_ref() to do the individual renames, which just
asks the backend to handle that for us (so e.g., the files backend just
copies/moves the log files).

So it feels like ideally we'd be able to create a transaction element
for renaming, and then the backends could similarly do what makes sense
for them (and we wouldn't need a bunch of reflog code here).

I guess that does not work with the two delete/create transactions you
end up with here, though. And you need those to worry about D/F
conflicts. But then...how did the original handle D/F conflicts? It kind
of looks like it didn't, as it is doing a mass ref-by-ref rename in the
middle.

If the refs code learned how to order things to handle the D/F conflicts
within a transaction, then we could do a single transaction. And it
could learn about rename primitives.

I dunno. I think that would be nicer, but it's probably not worth
holding up this topic. Your perf numbers are very nice. I guess the
possible flip-side is that the existing code could be faster when
renaming a single ref (so no quadratic behavior) with a pathological
reflog (so moving the file is faster than re-writing all of those logs).

Hmm, yeah. Something like this:

	cat >setup <<-\EOF
	#!/bin/sh

	rm -rf repo
	git init repo
	cd repo

	git init server
	git -C server commit --allow-empty -m foo

	git remote add origin server
	git fetch

	# make the reflog gigantic
	perl -i -ne 'for my $i (1..10**5) { print }' .git/logs/refs/remotes/origin/main
	EOF

	hyperfine -p ./setup -L v old,new './git.{v} -C repo remote rename origin foo'

results in:
  
  Benchmark 1: ./git.old -C repo remote rename origin foo
    Time (mean ± σ):       5.5 ms ±   1.1 ms    [User: 1.5 ms, System: 1.3 ms]
    Range (min … max):     3.6 ms …   9.7 ms    58 runs
  
  Benchmark 2: ./git.new -C repo remote rename origin foo
    Time (mean ± σ):     476.3 ms ±   9.8 ms    [User: 203.6 ms, System: 268.0 ms]
    Range (min … max):   467.8 ms … 498.7 ms    10 runs
  
  Summary
    ./git.old -C repo remote rename origin foo ran
     86.43 ± 16.61 times faster than ./git.new -C repo remote rename origin foo

It's hard to bring myself to care, though. This is a stupidly
pathological reflog, and the absolute time change is peanuts compared to
the per-ref cost you're fixing here.

-Peff




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux