Karthik Nayak <karthik.188@xxxxxxxxx> writes: > The 'git bundle create' command has non-linear performance with the > number of refs in the repository. Benchmarking the command shows that > a large portion of the time (~75%) is spent in the > `object_array_remove_duplicates()` function. > > The `object_array_remove_duplicates()` function was added in > b2a6d1c686 (bundle: allow the same ref to be given more than once, > 2009-01-17) to skip duplicate refs provided by the user from being > written to the bundle. Since this is an O(N^2) algorithm, in repos with > large number of references, this can take up a large amount of time. > > Let's instead use a 'strset' to skip duplicates inside > `write_bundle_refs()`. This improves the performance by around 6 times > when tested against in repository with 100000 refs: > > Benchmark 1: bundle (refcount = 100000, revision = master) > Time (mean ± σ): 14.653 s ± 0.203 s [User: 13.940 s, System: 0.762 s] > Range (min … max): 14.237 s … 14.920 s 10 runs > > Benchmark 2: bundle (refcount = 100000, revision = HEAD) > Time (mean ± σ): 2.394 s ± 0.023 s [User: 1.684 s, System: 0.798 s] > Range (min … max): 2.364 s … 2.425 s 10 runs > > Summary > bundle (refcount = 100000, revision = HEAD) ran > 6.12 ± 0.10 times faster than bundle (refcount = 100000, revision = master) I've done some benchmarking with some "real life" repositories, which only have a couple of thousand refs and there the difference (expectedly) barely noticable. Which is good to know there also isn't any regression. This version looks good to me, I approve. -- Toon