Re: [PATCH v2 04/13] p5313: add performance tests for --path-walk

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Mar 24, 2025 at 03:22:40PM +0000, Derrick Stolee via GitGitGadget wrote:
> The cases where the --path-walk option really shines is when the default
> name-hash is overwhelmed with collisions. An open source example can be
> found in the microsoft/fluentui repo [1] at a certain commit [2].
>
> [1] https://github.com/microsoft/fluentui
> [2] e70848ebac1cd720875bccaa3026f4a9ed700e08

A meta-comment that occurred to me at this point while reading through
the series. I typically think of the namehash function here as wanting
to have some amount of collisions so that objects which are related but
have slightly different paths (e.g., due to a rename, or similar) are
delta'd against each other.

In my mind, the problem isn't that the hash function has collisions, but
that it has *bad* collisions between two paths where the content of each
object is completely unrelated to the other, and any effort to find
deltas here would be wasted at best.

I'm not sure I have a concrete suggestion for how to reword this, but I
figured it was worth mentioning since not all collisions are bad or
created equal ;-).

> Running the tests on this repo results in the following comparison table:
>
> Pack Type            Hash v1    Hash v2    Path Walk
> ---------------------------------------------------
> thin pack    (time)    0.36s      0.12s      0.08s
>              (size)    1.2M      22.0K      18.4K
> big pack     (time)    2.00s      2.90s      2.21s
>              (size)   20.4M      25.9M      19.5M
> shallow pack (time)    1.41s      1.80s      1.65s
>              (size)   34.4M      33.7M      33.6M

Thanks for reformatting. As a side-note (unrelated to your series), I do
think it would be nice to have a t/perf-native way of doing this sort of
multi-dimensional comparison.

> ---
>  t/perf/p5313-pack-objects.sh | 37 ++++++++++++++++++++++--------------
>  1 file changed, 23 insertions(+), 14 deletions(-)

The patch itself makes perfect sense to me.

Thanks,
Taylor




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux