On Mon, Mar 24, 2025 at 03:22:47PM +0000, Derrick Stolee via GitGitGadget wrote: > Using the Git repository as a test repo, the p5313 performance test > shows that the resulting size of the repo is the same, but the threaded > implementation gives gains of varying degrees depending on the number of > objects being packed. (This was tested on a 16-core machine.) > > Test HEAD~1 HEAD > --------------------------------------------------- > 5313.20: big pack 2.38 1.99 -16.4% > 5313.21: big pack size 16.1M 16.0M -0.2% > 5313.24: repack 107.32 45.41 -57.7% > 5313.25: repack size 213.3M 213.2M -0.0% Very nice indeed! > This ~60% reduction in 'git repack --path-walk' time is typical across > all repos I used for testing. What is interesting is to compare when the > overall time improves enough to outperform the --name-hash-version=1 > case. These time improvements correlate with repositories with data > shapes that significantly improve their data size as well. The I had to read this sentence a couple of times to wrap my head around it, but perhaps you can double-check my understanding. Are you saying that reducing the size of the resulting packs suggests that our runtime with the path-walk algorithm is improved enough to be competitive with (or better than) the non-path-walk implementation with name-hash v1? I think that's what you mean, and I admit that I don't find my own restatement to be all that much clearer. So I think it's fine to leave this as-is, though it is interesting to think about why the statement is true. I suspect (without having tested it thoroughly) that a significant proportion of the time saved on the path-walk side is because we found better deltas that are easier to compress, as a result of a more well-tuned search technique. I don't think you necessarily have to get too far into the details here, but I was curious (not for commit message polishing, but purely for my own curiosity) if you had any thoughts or measurements of why this might be the case. But... > --path-walk feature frequently takes longer than --name-hash-verison=2, > trading some extrac computation for some additional compression. The s/extrac/extra/ > natural place where this additional computation comes from is the two > compression passes that --path-walk takes, though the first pass is > naturally faster due to the path boundaries avoiding a number of delta > compression attempts. ... I should have kept reading before commenting the above! ;-) > Signed-off-by: Derrick Stolee <stolee@xxxxxxxxx> > --- > builtin/pack-objects.c | 163 ++++++++++++++++++++++++++++++++++++++++- > 1 file changed, 161 insertions(+), 2 deletions(-) > > diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c > index d4e05ca4434..2a6246c1e78 100644 > --- a/builtin/pack-objects.c > +++ b/builtin/pack-objects.c > @@ -2964,6 +2964,7 @@ static void find_deltas(struct object_entry **list, unsigned *list_size, > struct thread_params { > pthread_t thread; > struct object_entry **list; > + struct packing_region *regions; > unsigned list_size; > unsigned remaining; > int window; > @@ -3281,6 +3282,164 @@ static void find_deltas_by_region(struct object_entry *list, > stop_progress(&progress_state); > } > > +static void *threaded_find_deltas_by_path(void *arg) > +{ > + struct thread_params *me = arg; > + > + progress_lock(); > + while (me->remaining) { > + while (me->remaining) { I am not quite following the double-while loop here. Could you help clarify what's going on here? Is this due to the work-stealing below? > +static void ll_find_deltas_by_region(struct object_entry *list, > + struct packing_region *regions, > + uint32_t start, uint32_t nr) > +{ > + struct thread_params *p; > + int i, ret, active_threads = 0; > + unsigned int processed = 0; > + uint32_t progress_nr; > + init_threaded_search(); > + > + if (!nr) > + return; > + > + progress_nr = regions[nr - 1].start + regions[nr - 1].nr; > + if (delta_search_threads <= 1) { > + find_deltas_by_region(list, regions, start, nr); > + cleanup_threaded_search(); > + return; > + } > + > + if (progress > pack_to_stdout) > + fprintf_ln(stderr, _("Path-based delta compression using up to %d threads"), > + delta_search_threads); This looks a copy-and-paste from the non-path-walk code, but I wonder if it might be worth using Q_() here instead of _() to provide better output in the case that delta_search_threads is 1. The rest of the implementation of ll_find_deltas_by_region() looks good to me. Thanks, Taylor