Re: [PATCH 2/2] xdiff: optimize xdl_hash_record_verbatim

Alexander Monakov <amonakov@xxxxxxxxx> · Mon, 4 Aug 2025 17:39:37 +0300 (MSK)

On Mon, 4 Aug 2025, Phillip Wood wrote:

> > Switch xdl_hash_record_verbatim to additive hashing and implement
> > an optimized loop following the scheme suggested by Noah.
> > 
> > Timing 'git log --oneline --shortstat v2.0.0..v2.5.0' under perf, I got
> > 
> > version | cycles, bn | instructions, bn
> > ---------------------------------------
> > A         6.38         11.3
> > B         6.21         10.89
> > C         5.80          9.95
> > D         5.83          8.74
> > ---------------------------------------
> > 
> > A: baseline (git master at e4ef0485fd78)
> > B: plus 'xdiff: refactor xdl_hash_record()'
> > C: and plus this patch
> > D: with 'xdiff: use xxhash' by Phillip Wood
> 
> I think it would be helpful to say that B is the previous patch and provide a
> link for D.

Ok, reworded locally, will appear in v2.

> > The resulting speedup for xdl_hash_record_verbatim itself is about 1.5x.
> 
> While that's interesting it does not tell us how much this speeds up diff
> generation.

That's what the 'cycles' column in the table gives (6.21/5.8 = 1.070...)

> Running the command above under hyperfine it is 1.02 ± 0.01 times
> faster than the previous patch and 1.11 ± 0.01 times faster than master.

Then you get 9% from the inlining patch and only 2% from the faster hash
function? That's a bit surprising, which compiler and CPU you used? Is it
with default optimization (-O2)?

> Using
> xxhash (D above) is 1.03 ± 0.01 times faster than this patch. How do the
> changes below affect compilers other than gcc and clang than do not see the
> re-association barrier?

I'd say under reasonable assumptions (e.g. a not too ancient CPU with 3-cycle
integer multiplication) the new scheme is generally faster even without asm.

But Git can certainly follow Glibc's choice and employ this only on x86_64
(and only with GCC or Clang).

> We'd want to make sure that it does not result in
> slower diffs. Can we use atomic_signal_fence() on compilers that support C11?

No, what we need to do here is outside of the abstract machine's view, standard
functions are not going to help.

Alexander

> (we don't require C11 so we'd have to make it optional but it is supported by
> things like MSVC)
> 
> Thanks
> 
> Phillip