Re: [PATCH v2] builtin/repack: Honor --keep-pack and .keep when repacking promisor objects

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Han Young <hanyang.tony@xxxxxxxxxxxxx> writes:

> From: Tomáš Trnka <trnka@xxxxxxx>
>
> git-repack currently does not pass --keep-pack or --honor-pack-keep to
> the git-pack-objects handling promisor packs. This means that settings
> like gc.bigPackThreshold are completely ignored for promisor packs.
>
> The simple fix is to just skip keep packs when iterating promisor packs.
>
> Signed-off-by: Han Young <hanyang.tony@xxxxxxxxxxxxx>
> Original-patch-by: Tomáš Trnka <trnka@xxxxxxx>
> ---
> The original patch passes the --keep-pack arguments to pack-objects. While
> this approach avoids repacking the keep packs, the objects in the keep
> packs are still enumerated. These objects are iterated in repack and sent
> to be packed, only to be skipped in pack-objects.
>
> Filtering out the keep packs in repack_promisor_objects is more efficient,
> the objects in keep packs are not enumerated and sent to pack-objects.

Christian, I thought you have also been active around promisor area
these days, so let us pick your brain for this one, too.

Thanks.

>
> In a test repo with a 2.7G promisor pack. The original patch took longer to
> repack the repo, most of the time was spent on enumerating objects.
>
> The original patch:
>
> $ time git repack -ad --keep-pack=pack-a26479ad4f9dff58448df6fca4953844009b3920.pack
> Enumerating objects: 19091575, done.
> Counting objects: 100% (5934/5934), done.
> Delta compression using up to 12 threads
> Compressing objects: 100% (4517/4517), done.
> Writing objects: 100% (5934/5934), done.
> Total 5934 (delta 2885), reused 2986 (delta 1192), pack-reused 0 (from 0)
> git repack -ad --keep-pack=pack-a26479ad4f9dff58448df6fca4953844009b3920.pack  43.94s user 46.83s system 134% cpu 1:07.60 total
>
> This patch:
>
> $ time git repack -ad --keep-pack=pack-a26479ad4f9dff58448df6fca4953844009b3920.pack
> Enumerating objects: 20952, done.
> Counting objects: 100% (18323/18323), done.
> Delta compression using up to 12 threads
> Compressing objects: 100% (9022/9022), done.
> Writing objects: 100% (18323/18323), done.
> Total 18323 (delta 11340), reused 15753 (delta 8771), pack-reused 0 (from 0)
> git repack -ad --keep-pack=pack-a26479ad4f9dff58448df6fca4953844009b3920.pack  8.65s user 3.14s system 62% cpu 18.838 total
>
> I also noticed that this patch produced a smaller packfile.
> 1.6M compared to 4.5M generated by the original patch.
> I suspect this is because some objects exist in multiple packs,
> so pack-objects still repacks them even if they are actually present
> in keep-packs.
> ---
>  builtin/repack.c | 18 ++++++++++++++----
>  1 file changed, 14 insertions(+), 4 deletions(-)
>
> diff --git a/builtin/repack.c b/builtin/repack.c
> index 59214dbdfd..fb09df53c0 100644
> --- a/builtin/repack.c
> +++ b/builtin/repack.c
> @@ -252,8 +252,9 @@ static void existing_packs_release(struct existing_packs *existing)
>  /*
>   * Adds all packs hex strings (pack-$HASH) to either packs->non_kept
>   * or packs->kept based on whether each pack has a corresponding
> - * .keep file or not.  Packs without a .keep file are not to be kept
> - * if we are going to pack everything into one file.
> + * .keep file or not.  Packs without a .keep file or specified by
> + * --keep-pack are not to be kept if we are going to pack everything
> + * into one file.
>   */
>  static void collect_pack_filenames(struct existing_packs *existing,
>  				   const struct string_list *extra_keep)
> @@ -278,8 +279,15 @@ static void collect_pack_filenames(struct existing_packs *existing,
>  		strbuf_addstr(&buf, base);
>  		strbuf_strip_suffix(&buf, ".pack");
>  
> -		if ((extra_keep->nr > 0 && i < extra_keep->nr) || p->pack_keep)
> +		if ((extra_keep->nr > 0 && i < extra_keep->nr) || p->pack_keep) {
>  			string_list_append(&existing->kept_packs, buf.buf);
> +			/*
> +			 * mark packs specified by --keep-pack as keep in core, keep packs
> +			 * are ignored by repack_promisor_objects. This avoids passthrough
> +			 * --keep-pack args to pack-objects
> +			 */
> +			p->pack_keep_in_core = 1;
> +		}
>  		else if (p->is_cruft)
>  			string_list_append(&existing->cruft_packs, buf.buf);
>  		else
> @@ -412,7 +420,9 @@ static void repack_promisor_objects(const struct pack_objects_args *args,
>  	 * of a {type -> size} ordering, which may produce better deltas.
>  	 */
>  	for_each_packed_object(the_repository, write_oid, &cmd,
> -			       FOR_EACH_OBJECT_PROMISOR_ONLY);
> +			       FOR_EACH_OBJECT_PROMISOR_ONLY |
> +				   FOR_EACH_OBJECT_SKIP_IN_CORE_KEPT_PACKS |
> +				   FOR_EACH_OBJECT_SKIP_ON_DISK_KEPT_PACKS);
>  
>  	if (cmd.in == -1) {
>  		/* No packed objects; cmd was never started */




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux