Re: [PATCH v2 11/13] pack-objects: thread the path-based compression

Taylor Blau <me@xxxxxxxxxxxx> · Tue, 6 May 2025 21:33:35 -0400

On Mon, Mar 24, 2025 at 03:22:47PM +0000, Derrick Stolee via GitGitGadget wrote:
> Using the Git repository as a test repo, the p5313 performance test
> shows that the resulting size of the repo is the same, but the threaded
> implementation gives gains of varying degrees depending on the number of
> objects being packed. (This was tested on a 16-core machine.)
>
> Test                        HEAD~1      HEAD
> ---------------------------------------------------
> 5313.20: big pack             2.38      1.99 -16.4%
> 5313.21: big pack size       16.1M     16.0M  -0.2%
> 5313.24: repack             107.32     45.41 -57.7%
> 5313.25: repack size        213.3M    213.2M  -0.0%

Very nice indeed!

> This ~60% reduction in 'git repack --path-walk' time is typical across
> all repos I used for testing. What is interesting is to compare when the
> overall time improves enough to outperform the --name-hash-version=1
> case. These time improvements correlate with repositories with data
> shapes that significantly improve their data size as well. The

I had to read this sentence a couple of times to wrap my head around it,
but perhaps you can double-check my understanding. Are you saying that
reducing the size of the resulting packs suggests that our runtime with
the path-walk algorithm is improved enough to be competitive with (or
better than) the non-path-walk implementation with name-hash v1?

I think that's what you mean, and I admit that I don't find my own
restatement to be all that much clearer. So I think it's fine to leave
this as-is, though it is interesting to think about why the statement
is true.

I suspect (without having tested it thoroughly) that a significant
proportion of the time saved on the path-walk side is because we found
better deltas that are easier to compress, as a result of a more
well-tuned search technique. I don't think you necessarily have to get
too far into the details here, but I was curious (not for commit message
polishing, but purely for my own curiosity) if you had any thoughts or
measurements of why this might be the case.

But...

> --path-walk feature frequently takes longer than --name-hash-verison=2,
> trading some extrac computation for some additional compression. The

s/extrac/extra/

> natural place where this additional computation comes from is the two
> compression passes that --path-walk takes, though the first pass is
> naturally faster due to the path boundaries avoiding a number of delta
> compression attempts.

... I should have kept reading before commenting the above! ;-)

> Signed-off-by: Derrick Stolee <stolee@xxxxxxxxx>
> ---
>  builtin/pack-objects.c | 163 ++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 161 insertions(+), 2 deletions(-)
>
> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
> index d4e05ca4434..2a6246c1e78 100644
> --- a/builtin/pack-objects.c
> +++ b/builtin/pack-objects.c
> @@ -2964,6 +2964,7 @@ static void find_deltas(struct object_entry **list, unsigned *list_size,
>  struct thread_params {
>  	pthread_t thread;
>  	struct object_entry **list;
> +	struct packing_region *regions;
>  	unsigned list_size;
>  	unsigned remaining;
>  	int window;
> @@ -3281,6 +3282,164 @@ static void find_deltas_by_region(struct object_entry *list,
>  	stop_progress(&progress_state);
>  }
>
> +static void *threaded_find_deltas_by_path(void *arg)
> +{
> +	struct thread_params *me = arg;
> +
> +	progress_lock();
> +	while (me->remaining) {
> +		while (me->remaining) {

I am not quite following the double-while loop here. Could you help
clarify what's going on here? Is this due to the work-stealing below?

> +static void ll_find_deltas_by_region(struct object_entry *list,
> +				     struct packing_region *regions,
> +				     uint32_t start, uint32_t nr)
> +{
> +	struct thread_params *p;
> +	int i, ret, active_threads = 0;
> +	unsigned int processed = 0;
> +	uint32_t progress_nr;
> +	init_threaded_search();
> +
> +	if (!nr)
> +		return;
> +
> +	progress_nr =  regions[nr - 1].start + regions[nr - 1].nr;
> +	if (delta_search_threads <= 1) {
> +		find_deltas_by_region(list, regions, start, nr);
> +		cleanup_threaded_search();
> +		return;
> +	}
> +
> +	if (progress > pack_to_stdout)
> +		fprintf_ln(stderr, _("Path-based delta compression using up to %d threads"),
> +			   delta_search_threads);

This looks a copy-and-paste from the non-path-walk code, but I wonder if
it might be worth using Q_() here instead of _() to provide better
output in the case that delta_search_threads is 1.

The rest of the implementation of ll_find_deltas_by_region() looks good
to me.

Thanks,
Taylor