[PATCH 1/4] pack-bitmap: write lookup table extension by default

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



The lookup table extension for reachability bitmaps was first introduced
via 3fe0121479 (Merge branch 'ac/bitmap-lookup-table', 2022-09-05).
For each bitmapped commit, the lookup table encodes three unsigned
integers:

  - The pack position of the commit.
  - An offset within the *.bitmap itself where that commit's bitmap
    lives.
  - An index into the same table where information on that commit's XOR
    pair can be found.

Lookup tables make bitmap operations faster because they no longer have
to read the entire *.bitmap file to discover what commits have
corresponding reachability bitmaps.

When bitmap lookup tables were first introduced, we established a
baseline level of performance in p5310 with and without lookup tables.
Here is the baseline without:

    Test                                                    this tree
    -----------------------------------------------------------------------
    5310.6: simulated clone                               14.04(5.78+1.79)
    5310.7: simulated fetch                               1.95(3.05+0.20)
    5310.8: pack to file (bitmap)                         44.73(20.55+7.45)
    5310.9: rev-list (commits)                            0.78(0.46+0.10)
    5310.10: rev-list (objects)                           4.07(3.97+0.08)
    5310.11: rev-list with tag negated via --not          0.06(0.02+0.03)
             --all (objects)
    5310.12: rev-list with negative tag (objects)         0.21(0.15+0.05)
    5310.13: rev-list count with blob:none                0.24(0.17+0.06)
    5310.14: rev-list count with blob:limit=1k            7.07(5.92+0.48)
    5310.15: rev-list count with tree:0                   0.25(0.17+0.07)
    5310.16: simulated partial clone                      5.67(3.28+0.64)
    5310.18: clone (partial bitmap)                       16.05(8.34+1.86)
    5310.19: pack to file (partial bitmap)                59.76(27.22+7.43)
    5310.20: rev-list with tree filter (partial bitmap)   0.90(0.18+0.16)

, and here is the same set of tests, this time with the lookup table
enabled:

    Test                                                    this tree
    -----------------------------------------------------------------------
    5310.26: simulated clone                              13.69(5.72+1.78)
    5310.27: simulated fetch                              1.84(3.02+0.16)
    5310.28: pack to file (bitmap)                        45.63(20.67+7.50)
    5310.29: rev-list (commits)                           0.56(0.39+0.8)
    5310.30: rev-list (objects)                           3.77(3.74+0.08)
    5310.31: rev-list with tag negated via --not          0.05(0.02+0.03)
             --all (objects)
    5310.32: rev-list with negative tag (objects)         0.21(0.15+0.05)
    5310.33: rev-list count with blob:none                0.23(0.17+0.05)
    5310.34: rev-list count with blob:limit=1k            6.65(5.72+0.40)
    5310.35: rev-list count with tree:0                   0.23(0.16+0.06)
    5310.36: simulated partial clone                      5.57(3.26+0.59)
    5310.38: clone (partial bitmap)                       15.89(8.39+1.84)
    5310.39: pack to file (partial bitmap)                58.32(27.55+7.47)
    5310.40: rev-list with tree filter (partial bitmap)   0.73(0.18+0.15)

(All numbers here come from a ~2022-era copy of the kernel, via
Abhradeep Chakraborty who implemented the lookup table extension).

In the almost three years since lookup tables were introduced, GitHub
has used them in production without issue, taking advantage of the above
performance benefits along the way.

Since this feature has had sufficient time to flush out any bugs and/or
performance regressions, let's enable it by default so that all bitmap
users can reap similar performance benefits.

[1]: https://lore.kernel.org/git/pull.1266.git.1655728395.gitgitgadget@xxxxxxxxx/

Signed-off-by: Taylor Blau <me@xxxxxxxxxxxx>
---
 Documentation/config/pack.adoc | 2 +-
 builtin/multi-pack-index.c     | 1 +
 builtin/pack-objects.c         | 2 +-
 3 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/Documentation/config/pack.adoc b/Documentation/config/pack.adoc
index da527377fa..ba538e3d9c 100644
--- a/Documentation/config/pack.adoc
+++ b/Documentation/config/pack.adoc
@@ -191,7 +191,7 @@ pack.writeBitmapLookupTable::
 	bitmap index (if one is written). This table is used to defer
 	loading individual bitmaps as late as possible. This can be
 	beneficial in repositories that have relatively large bitmap
-	indexes. Defaults to false.
+	indexes. Defaults to true.
 
 pack.readReverseIndex::
 	When true, git will read any .rev file(s) that may be available
diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
index 2a938466f5..6ad6f814e3 100644
--- a/builtin/multi-pack-index.c
+++ b/builtin/multi-pack-index.c
@@ -142,6 +142,7 @@ static int cmd_multi_pack_index_write(int argc, const char **argv,
 	int ret;
 
 	opts.flags |= MIDX_WRITE_BITMAP_HASH_CACHE;
+	opts.flags |= MIDX_WRITE_BITMAP_LOOKUP_TABLE;
 
 	git_config(git_multi_pack_index_write_config, NULL);
 
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 3973267e9e..384fefbb1d 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -239,7 +239,7 @@ static enum {
 	WRITE_BITMAP_QUIET,
 	WRITE_BITMAP_TRUE,
 } write_bitmap_index;
-static uint16_t write_bitmap_options = BITMAP_OPT_HASH_CACHE;
+static uint16_t write_bitmap_options = BITMAP_OPT_HASH_CACHE | BITMAP_OPT_LOOKUP_TABLE;
 
 static int exclude_promisor_objects;
 static int exclude_promisor_objects_best_effort;
-- 
2.49.0.226.g0e6cae136d





[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux