On Mon, Mar 24, 2025 at 03:22:40PM +0000, Derrick Stolee via GitGitGadget wrote: > The cases where the --path-walk option really shines is when the default > name-hash is overwhelmed with collisions. An open source example can be > found in the microsoft/fluentui repo [1] at a certain commit [2]. > > [1] https://github.com/microsoft/fluentui > [2] e70848ebac1cd720875bccaa3026f4a9ed700e08 A meta-comment that occurred to me at this point while reading through the series. I typically think of the namehash function here as wanting to have some amount of collisions so that objects which are related but have slightly different paths (e.g., due to a rename, or similar) are delta'd against each other. In my mind, the problem isn't that the hash function has collisions, but that it has *bad* collisions between two paths where the content of each object is completely unrelated to the other, and any effort to find deltas here would be wasted at best. I'm not sure I have a concrete suggestion for how to reword this, but I figured it was worth mentioning since not all collisions are bad or created equal ;-). > Running the tests on this repo results in the following comparison table: > > Pack Type Hash v1 Hash v2 Path Walk > --------------------------------------------------- > thin pack (time) 0.36s 0.12s 0.08s > (size) 1.2M 22.0K 18.4K > big pack (time) 2.00s 2.90s 2.21s > (size) 20.4M 25.9M 19.5M > shallow pack (time) 1.41s 1.80s 1.65s > (size) 34.4M 33.7M 33.6M Thanks for reformatting. As a side-note (unrelated to your series), I do think it would be nice to have a t/perf-native way of doing this sort of multi-dimensional comparison. > --- > t/perf/p5313-pack-objects.sh | 37 ++++++++++++++++++++++-------------- > 1 file changed, 23 insertions(+), 14 deletions(-) The patch itself makes perfect sense to me. Thanks, Taylor