The git-fetch(1) and git-receive-pack(1) commands update references as part of the flow. Each reference update is treated as a single entity and a transaction is created for each update. This can be really slow, specifically in reference backends where there are optimizations which ensure a single transaction with 'N' reference update perform much faster than 'N' individual transactions. Also having 'N' individual transactions has buildup and teardown costs. These costs add up in repositories with a large number of references. Also specifically in the reftable backend, 'N' individual transactions would also trigger auto-compaction for every transaction. The reasoning for using individual transactions is because we want to allow partial updates of references in these commands. Using a single transaction would be an all-or-nothing scenario. Recently we introduced an in-between solution called batched reference updates in 23fc8e4f61 (refs: implement batch reference update support, 2025-04-08). This allows us to batch a set of reference updates, where individual updates can pass/fail without affecting the batch. This patch series, modifies both 'git-fetch(1)' and 'git-receive-pack(1)' to use this mechanism. With this, we see a significant performance boost: +---------------------+---------------+------------------+ | | files backend | reftable backend | +---------------------+---------------+------------------+ | git-fetch(1) | 1.25x | 22x | | git-receive-pack(1) | 1.21x | 18x | +---------------------+---------------+------------------+ The first and third patch handle the changes for 'git-fetch(1)' and 'git-receive-pack(1)' respectively. The second patch fixes a small memory leak I encountered while working on this series. This is based on top of master: 7a1d2bd0a5 (Merge branch 'master' of https://github.com/j6t/gitk, 2025-05-09). There were no conflicts observed with next or seen. Junio, since the release window for 2.50 is closing in. I would prefer we mark this for 2.51, so perhaps when/if we merge it to 'next', we let it bake there till the next release window opens. While all the tests pass, any bugs here would be high impact and it would be nice to catch it before it hits a release. --- Changes in v3: - Modify `ref_transaction_error_msg()` to no longer provided an allocated string, but rather a 'const char *' string literal, this cleans up unnecessary allocation/free calls. - Remove some unnecessary type casts. - Fix some typos. - Link to v2: https://lore.kernel.org/r/20250515-501-update-git-fetch-1-to-use-partial-transactions-v2-0-80cbaaa55d2e@xxxxxxxxx Changes in v2: - Added a new commit to introduce a mapping from 'ref_transaction_error' to a human readable string. We now use this to map the errors observed. - Add a TODO regarding handling pruning and creation of refs in the same transaction in 'git-fetch(1)'. - Cleanup commit message typos. - Fix logic in 'builtin/receive-pack.c' around error reporting to be cleaner. Thanks Patrick for the suggestion. - Link to v1: https://lore.kernel.org/r/20250514-501-update-git-fetch-1-to-use-partial-transactions-v1-0-7c65f46493d4@xxxxxxxxx --- builtin/fetch.c | 127 ++++++++++++++++++++++----------------- builtin/receive-pack.c | 64 +++++++++++++++----- builtin/update-ref.c | 25 +------- refs.c | 20 ++++++ refs.h | 5 ++ send-pack.c | 7 +++ t/t1416-ref-transaction-hooks.sh | 2 - t/t5408-send-pack-stdin.sh | 15 ++++- 8 files changed, 166 insertions(+), 99 deletions(-) Karthik Nayak (4): refs: add function to translate errors to strings fetch: use batched reference updates send-pack: fix memory leak around duplicate refs receive-pack: use batched reference updates Range-diff versus v2: 1: 5196124be4 ! 1: 5bd1ffe4d2 refs: add function to translate errors to strings @@ Commit message to 'refs.c' as `ref_transaction_error_msg()` and use the same within the 'builtin/update-ref.c'. + Helped-by: Junio C Hamano <gitster@xxxxxxxxx> Signed-off-by: Karthik Nayak <karthik.188@xxxxxxxxx> ## builtin/update-ref.c ## @@ builtin/update-ref.c: static void print_rejected_refs(const char *refname, - default: - reason = "unkown failure"; - } -+ char *reason = ref_transaction_error_msg(err); ++ const char *reason = ref_transaction_error_msg(err); strbuf_addf(&sb, "rejected %s %s %s %s\n", refname, new_oid ? oid_to_hex(new_oid) : new_target, -@@ builtin/update-ref.c: static void print_rejected_refs(const char *refname, - reason); - - fwrite(sb.buf, sb.len, 1, stdout); -+ free(reason); - strbuf_release(&sb); - } - ## refs.c ## @@ refs.c: int ref_update_expects_existing_old_ref(struct ref_update *update) @@ refs.c: int ref_update_expects_existing_old_ref(struct ref_update *update) (!is_null_oid(&update->old_oid) || update->old_target); } + -+char *ref_transaction_error_msg(enum ref_transaction_error err) ++const char *ref_transaction_error_msg(enum ref_transaction_error err) +{ -+ const char *reason = ""; -+ + switch (err) { + case REF_TRANSACTION_ERROR_NAME_CONFLICT: -+ reason = "refname conflict"; -+ break; ++ return "refname conflict"; + case REF_TRANSACTION_ERROR_CREATE_EXISTS: -+ reason = "reference already exists"; -+ break; ++ return "reference already exists"; + case REF_TRANSACTION_ERROR_NONEXISTENT_REF: -+ reason = "reference does not exist"; -+ break; ++ return "reference does not exist"; + case REF_TRANSACTION_ERROR_INCORRECT_OLD_VALUE: -+ reason = "incorrect old value provided"; -+ break; ++ return "incorrect old value provided"; + case REF_TRANSACTION_ERROR_INVALID_NEW_VALUE: -+ reason = "invalid new value provided"; -+ break; ++ return "invalid new value provided"; + case REF_TRANSACTION_ERROR_EXPECTED_SYMREF: -+ reason = "expected symref but found regular ref"; -+ break; ++ return "expected symref but found regular ref"; + default: -+ reason = "unkown failure"; ++ return "unknown failure"; + } -+ -+ return xstrdup(reason); +} ## refs.h ## @@ refs.h: void ref_transaction_for_each_rejected_update(struct ref_transaction *tr +/* + * Translate errors to human readable error messages. + */ -+char *ref_transaction_error_msg(enum ref_transaction_error err); ++const char *ref_transaction_error_msg(enum ref_transaction_error err); + /* * Free `*transaction` and all associated data. 2: c2014b959f ! 2: dadabf1918 fetch: use batched reference updates @@ builtin/fetch.c: static int set_head(const struct ref *remote_refs, struct remot + enum ref_transaction_error err, + void *cb_data) +{ -+ struct ref_rejection_data *data = (struct ref_rejection_data *)cb_data; ++ struct ref_rejection_data *data = cb_data; + + if (err == REF_TRANSACTION_ERROR_NAME_CONFLICT && !data->conflict_msg_shown) { + error(_("some local refs could not be updated; try running\n" @@ builtin/fetch.c: static int set_head(const struct ref *remote_refs, struct remot + "branches"), data->remote_name); + data->conflict_msg_shown = 1; + } else { -+ char *reason = ref_transaction_error_msg(err); ++ const char *reason = ref_transaction_error_msg(err); + + error(_("fetching ref %s failed: %s"), refname, reason); -+ free(reason); + } + + *data->retcode = 1; @@ builtin/fetch.c: static int do_fetch(struct transport *transport, + * + * TODO: if reference transactions gain logical conflict resolution, we + * can delete and create refs (with F/D conflicts) in the same transaction -+ * and this can be moved about the 'prune_refs()' block. ++ * and this can be moved above the 'prune_refs()' block. + */ + if (!transaction) { + transaction = ref_store_transaction_begin(get_main_ref_store(the_repository), 3: 43f0dd3c31 = 3: 4006905807 send-pack: fix memory leak around duplicate refs 4: a976afa025 ! 4: 0549b78b48 receive-pack: use batched reference updates @@ Commit message batch updates use a transaction under the hood. This explains the change in 't1416'. + Helped-by: Jeff King <peff@xxxxxxxx> Helped-by: Patrick Steinhardt <ps@xxxxxx> Signed-off-by: Karthik Nayak <karthik.188@xxxxxxxxx> @@ builtin/receive-pack.c: static void BUG_if_skipped_connectivity_check(struct com +{ + struct strmap *failed_refs = cb_data; + -+ strmap_put(failed_refs, refname, ref_transaction_error_msg(err)); ++ strmap_put(failed_refs, refname, (char *)ref_transaction_error_msg(err)); +} + static void execute_commands_non_atomic(struct command *commands, @@ builtin/receive-pack.c: static void BUG_if_skipped_connectivity_check(struct com + if (reported_error) + cmd->error_string = reported_error; + else if (strmap_contains(&failed_refs, cmd->ref_name)) -+ cmd->error_string = xstrdup(strmap_get(&failed_refs, cmd->ref_name)); ++ cmd->error_string = strmap_get(&failed_refs, cmd->ref_name); } + +cleanup: + ref_transaction_free(transaction); -+ strmap_clear(&failed_refs, 1); ++ strmap_clear(&failed_refs, 0); strbuf_release(&err); } @@ t/t5408-send-pack-stdin.sh: test_expect_success 'stdin mixed with cmdline' ' echo A:foo >input && test_must_fail git send-pack remote.git --stdin B:foo <input && - verify_push B foo ++ test_grep "multiple updates for ref ${SQ}refs/heads/foo${SQ} not allowed" err && + test_must_fail git --git-dir=remote.git rev-parse foo ' base-commit: 7a1d2bd0a596f42a8a7a68d55577967bb454fec0 change-id: 20250423-501-update-git-fetch-1-to-use-partial-transactions-7fb66339fe79 Thanks - Karthik