From: Derrick Stolee <stolee@xxxxxxxxx> The 'loose-objects' task of 'git maintenance run' first deletes loose objects that exit within packfiles and then collects loose objects into a packfile. This second step uses an implicit limit of fifty thousand that cannot be modified by users. Add a new config option that allows this limit to be adjusted or ignored entirely. While creating tests for this option, I noticed that actually there was an off-by-one error due to the strict comparison in the limit check. I considered making the limit check turn true on equality, but instead I thought to use INT_MAX as a "no limit" barrier which should mean it's never possible to hit the limit. Thus, a new decrement to the limit is provided if the value is positive. (The restriction to positive values is to avoid underflow if INT_MIN is configured.) Signed-off-by: Derrick Stolee <stolee@xxxxxxxxx> --- Documentation/config/maintenance.adoc | 5 +++++ Documentation/git-maintenance.adoc | 18 ++++++++++------- builtin/gc.c | 10 ++++++++++ t/t7900-maintenance.sh | 28 +++++++++++++++++++++++++++ 4 files changed, 54 insertions(+), 7 deletions(-) diff --git a/Documentation/config/maintenance.adoc b/Documentation/config/maintenance.adoc index 72a9d6cf816..42f9545da0e 100644 --- a/Documentation/config/maintenance.adoc +++ b/Documentation/config/maintenance.adoc @@ -61,6 +61,11 @@ maintenance.loose-objects.auto:: loose objects is at least the value of `maintenance.loose-objects.auto`. The default value is 100. +maintenance.loose-objects.batchSize:: + This integer config option controls the maximum number of loose objects + written into a packfile during the `loose-objects` task. The default is + fifty thousand. Use value `0` to indicate no limit. + maintenance.incremental-repack.auto:: This integer config option controls how often the `incremental-repack` task should be run as part of `git maintenance run --auto`. If zero, diff --git a/Documentation/git-maintenance.adoc b/Documentation/git-maintenance.adoc index 0450d74aff1..c90b370b1fc 100644 --- a/Documentation/git-maintenance.adoc +++ b/Documentation/git-maintenance.adoc @@ -126,13 +126,17 @@ loose-objects:: objects that already exist in a pack-file; concurrent Git processes will examine the pack-file for the object data instead of the loose object. Second, it creates a new pack-file (starting with "loose-") - containing a batch of loose objects. The batch size is limited to 50 - thousand objects to prevent the job from taking too long on a - repository with many loose objects. The `gc` task writes unreachable - objects as loose objects to be cleaned up by a later step only if - they are not re-added to a pack-file; for this reason it is not - advisable to enable both the `loose-objects` and `gc` tasks at the - same time. + containing a batch of loose objects. ++ +The batch size defaults to fifty thousand objects to prevent the job from +taking too long on a repository with many loose objects. Use the +`maintenance.loose-objects.batchSize` config option to adjust this size, +including a value of `0` to remove the limit. ++ +The `gc` task writes unreachable objects as loose objects to be cleaned up +by a later step only if they are not re-added to a pack-file; for this +reason it is not advisable to enable both the `loose-objects` and `gc` +tasks at the same time. incremental-repack:: The `incremental-repack` job repacks the object directory diff --git a/builtin/gc.c b/builtin/gc.c index 6672f165bda..817081e1a50 100644 --- a/builtin/gc.c +++ b/builtin/gc.c @@ -1163,6 +1163,7 @@ static int write_loose_object_to_stdin(const struct object_id *oid, fprintf(d->in, "%s\n", oid_to_hex(oid)); + /* If batch_size is INT_MAX, then this will return 0 always. */ return ++(d->count) > d->batch_size; } @@ -1208,6 +1209,15 @@ static int pack_loose(struct maintenance_run_opts *opts) data.count = 0; data.batch_size = 50000; + repo_config_get_int(r, "maintenance.loose-objects.batchSize", + &data.batch_size); + + /* If configured as 0, then remove limit. */ + if (!data.batch_size) + data.batch_size = INT_MAX; + else if (data.batch_size > 0) + data.batch_size--; /* Decrease for equality on limit. */ + for_each_loose_file_in_objdir(r->objects->odb->path, write_loose_object_to_stdin, NULL, diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh index 1909aed95e0..834ddb5ad68 100755 --- a/t/t7900-maintenance.sh +++ b/t/t7900-maintenance.sh @@ -306,6 +306,34 @@ test_expect_success 'maintenance.loose-objects.auto' ' test_subcommand git prune-packed --quiet <trace-loC ' +test_expect_success 'maintenance.loose-objects.batchSize' ' + git init loose-batch && + + # This creates three objects per commit. + test_commit_bulk -C loose-batch 34 && + pack=$(ls loose-batch/.git/objects/pack/pack-*.pack) && + index="${pack%pack}idx" && + rm "$index" && + git -C loose-batch unpack-objects <"$pack" && + git -C loose-batch config maintenance.loose-objects.batchSize 50 && + + GIT_PROGRESS_DELAY=0 \ + git -C loose-batch maintenance run --no-quiet --task=loose-objects 2>err && + grep "Enumerating objects: 50, done." err && + + GIT_PROGRESS_DELAY=0 \ + git -C loose-batch maintenance run --no-quiet --task=loose-objects 2>err && + grep "Enumerating objects: 50, done." err && + + GIT_PROGRESS_DELAY=0 \ + git -C loose-batch maintenance run --no-quiet --task=loose-objects 2>err && + grep "Enumerating objects: 2, done." err && + + GIT_PROGRESS_DELAY=0 \ + git -C loose-batch maintenance run --no-quiet --task=loose-objects 2>err && + test_must_be_empty err +' + test_expect_success 'incremental-repack task' ' packDir=.git/objects/pack && for i in $(test_seq 1 5) -- gitgitgadget