Re: [PATCH 3/4] builtin/remote: rework how remote refs get renamed

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jul 29, 2025 at 04:16:58AM -0400, Jeff King wrote:
> On Mon, Jul 28, 2025 at 03:08:47PM +0200, Patrick Steinhardt wrote:
> 
> > The next-best thing is to do it in two transactions: one to delete all
> > the references, and one to recreate the references and their reflogs.
> > This signicantly speeds up the operation with the "files" backend. The
> > following benchmark renames a remote with 10000 references:
> 
> Hmm. I was surprised to see so much reflog code here. It looks like
> you're replaying the old reflog entry by entry. But the old code was
> leaning on refs_rename_ref() to do the individual renames, which just
> asks the backend to handle that for us (so e.g., the files backend just
> copies/moves the log files).
> 
> So it feels like ideally we'd be able to create a transaction element
> for renaming, and then the backends could similarly do what makes sense
> for them (and we wouldn't need a bunch of reflog code here).

I think overall the ref transaction code is in for a bit of a redesign.
The current one-size-fits-all `struct ref_update` doesn't make a lot of
sense anymore. The requirements have shifted because of the reftable
backend, where we want to redesign interfaces so that we batch updates
together to the best extent possible.

And this patch series here demonstrates that not only the "reftable"
backend benefits.

In any case, the current infrastructure is extremely brittle and every
change one does is a bit like threading a needle. More likely than not
you missed this one edge case where changing a flag for one of the
updates has a consequence in a completely unrelated place.

I think we should start splitting this up more and introduce different
update types:

  - Forced updates where we don't very the old state.

  - Raceless updates where we do.

  - Reflog updates that only write a message as well as an old/new
    state.

If we had such an infrastructure it would also be feasible to introduce
more types, like deletes or renames.

> I guess that does not work with the two delete/create transactions you
> end up with here, though. And you need those to worry about D/F
> conflicts. But then...how did the original handle D/F conflicts? It kind
> of looks like it didn't, as it is doing a mass ref-by-ref rename in the
> middle.
> 
> If the refs code learned how to order things to handle the D/F conflicts
> within a transaction, then we could do a single transaction. And it
> could learn about rename primitives.

True. But as I already mentioned to Junio I don't really know how to
backfill D/F conflict handling in the "files" backend. The problem is
that with preexisting "refs/heads/parent" and "refs/heads/parent/child"
you cannot create the latter ".lock" file. Sure, we can hack our way
around that in some manner. But is that backwards compatible if another
Git client were to operate in the same repository? I dunno.

> I dunno. I think that would be nicer, but it's probably not worth
> holding up this topic. Your perf numbers are very nice. I guess the
> possible flip-side is that the existing code could be faster when
> renaming a single ref (so no quadratic behavior) with a pathological
> reflog (so moving the file is faster than re-writing all of those logs).
> 
> Hmm, yeah. Something like this:
> 
> 	cat >setup <<-\EOF
> 	#!/bin/sh
> 
> 	rm -rf repo
> 	git init repo
> 	cd repo
> 
> 	git init server
> 	git -C server commit --allow-empty -m foo
> 
> 	git remote add origin server
> 	git fetch
> 
> 	# make the reflog gigantic
> 	perl -i -ne 'for my $i (1..10**5) { print }' .git/logs/refs/remotes/origin/main
> 	EOF
> 
> 	hyperfine -p ./setup -L v old,new './git.{v} -C repo remote rename origin foo'
> 
> results in:
>   
>   Benchmark 1: ./git.old -C repo remote rename origin foo
>     Time (mean ± σ):       5.5 ms ±   1.1 ms    [User: 1.5 ms, System: 1.3 ms]
>     Range (min … max):     3.6 ms …   9.7 ms    58 runs
>   
>   Benchmark 2: ./git.new -C repo remote rename origin foo
>     Time (mean ± σ):     476.3 ms ±   9.8 ms    [User: 203.6 ms, System: 268.0 ms]
>     Range (min … max):   467.8 ms … 498.7 ms    10 runs
>   
>   Summary
>     ./git.old -C repo remote rename origin foo ran
>      86.43 ± 16.61 times faster than ./git.new -C repo remote rename origin foo
> 
> It's hard to bring myself to care, though. This is a stupidly
> pathological reflog, and the absolute time change is peanuts compared to
> the per-ref cost you're fixing here.

For the "files" backend performance is worse, for the "reftable" backend
I'd expect that this might even be faster. Mostly because there is no
way to trivially rename a reflog -- we basically do the same on a rename
as we are doing with this patch series now.

Overall I don't care too much about this edge case. By default we never
write reflogs for remote references anyway, and I doubt that you'll ever
end up with a remote reflog that has thousands of entries. So I'd rather
make the general case fast even if the esoteric case becomes slower.

But ideally we're able to lift such limitations in the future if we were
to do the above rework.

Patrick




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux