[PATCH v3 0/4] fetch/receive: use batched reference updates

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



The git-fetch(1) and git-receive-pack(1) commands update references as
part of the flow. Each reference update is treated as a single entity
and a transaction is created for each update.

This can be really slow, specifically in reference backends where there
are optimizations which ensure a single transaction with 'N' reference
update perform much faster than 'N' individual transactions. Also having
'N' individual transactions has buildup and teardown costs. These costs
add up in repositories with a large number of references.

Also specifically in the reftable backend, 'N' individual transactions
would also trigger auto-compaction for every transaction.

The reasoning for using individual transactions is because we want to
allow partial updates of references in these commands. Using a single
transaction would be an all-or-nothing scenario.

Recently we introduced an in-between solution called batched reference
updates in 23fc8e4f61 (refs: implement batch reference update support,
2025-04-08). This allows us to batch a set of reference updates, where
individual updates can pass/fail without affecting the batch.

This patch series, modifies both 'git-fetch(1)' and
'git-receive-pack(1)' to use this mechanism. With this, we see a
significant performance boost:

+---------------------+---------------+------------------+
|                     | files backend | reftable backend |
+---------------------+---------------+------------------+
| git-fetch(1)        | 1.25x         | 22x              |
| git-receive-pack(1) | 1.21x         | 18x              |
+---------------------+---------------+------------------+

The first and third patch handle the changes for 'git-fetch(1)' and
'git-receive-pack(1)' respectively. The second patch fixes a small
memory leak I encountered while working on this series.

This is based on top of master: 7a1d2bd0a5 (Merge branch 'master' of
https://github.com/j6t/gitk, 2025-05-09). There were no conflicts
observed with next or seen.

Junio, since the release window for 2.50 is closing in. I would prefer
we mark this for 2.51, so perhaps when/if we merge it to 'next', we let
it bake there till the next release window opens. While all the tests
pass, any bugs here would be high impact and it would be nice to catch
it before it hits a release.

---
Changes in v3:
- Modify `ref_transaction_error_msg()` to no longer provided an
  allocated string, but rather a 'const char *' string literal, this
  cleans up unnecessary allocation/free calls.
- Remove some unnecessary type casts.
- Fix some typos.
- Link to v2: https://lore.kernel.org/r/20250515-501-update-git-fetch-1-to-use-partial-transactions-v2-0-80cbaaa55d2e@xxxxxxxxx

Changes in v2:
- Added a new commit to introduce a mapping from 'ref_transaction_error'
  to a human readable string. We now use this to map the errors
  observed.
- Add a TODO regarding handling pruning and creation of refs in the same
  transaction in 'git-fetch(1)'.
- Cleanup commit message typos.
- Fix logic in 'builtin/receive-pack.c' around error reporting to be
  cleaner. Thanks Patrick for the suggestion.
- Link to v1: https://lore.kernel.org/r/20250514-501-update-git-fetch-1-to-use-partial-transactions-v1-0-7c65f46493d4@xxxxxxxxx

---
 builtin/fetch.c                  | 127 ++++++++++++++++++++++-----------------
 builtin/receive-pack.c           |  64 +++++++++++++++-----
 builtin/update-ref.c             |  25 +-------
 refs.c                           |  20 ++++++
 refs.h                           |   5 ++
 send-pack.c                      |   7 +++
 t/t1416-ref-transaction-hooks.sh |   2 -
 t/t5408-send-pack-stdin.sh       |  15 ++++-
 8 files changed, 166 insertions(+), 99 deletions(-)

Karthik Nayak (4):
      refs: add function to translate errors to strings
      fetch: use batched reference updates
      send-pack: fix memory leak around duplicate refs
      receive-pack: use batched reference updates

Range-diff versus v2:

1:  5196124be4 ! 1:  5bd1ffe4d2 refs: add function to translate errors to strings
    @@ Commit message
         to 'refs.c' as `ref_transaction_error_msg()` and use the same within the
         'builtin/update-ref.c'.
     
    +    Helped-by: Junio C Hamano <gitster@xxxxxxxxx>
         Signed-off-by: Karthik Nayak <karthik.188@xxxxxxxxx>
     
      ## builtin/update-ref.c ##
    @@ builtin/update-ref.c: static void print_rejected_refs(const char *refname,
     -	default:
     -		reason = "unkown failure";
     -	}
    -+	char *reason = ref_transaction_error_msg(err);
    ++	const char *reason = ref_transaction_error_msg(err);
      
      	strbuf_addf(&sb, "rejected %s %s %s %s\n", refname,
      		    new_oid ? oid_to_hex(new_oid) : new_target,
    -@@ builtin/update-ref.c: static void print_rejected_refs(const char *refname,
    - 		    reason);
    - 
    - 	fwrite(sb.buf, sb.len, 1, stdout);
    -+	free(reason);
    - 	strbuf_release(&sb);
    - }
    - 
     
      ## refs.c ##
     @@ refs.c: int ref_update_expects_existing_old_ref(struct ref_update *update)
    @@ refs.c: int ref_update_expects_existing_old_ref(struct ref_update *update)
      		(!is_null_oid(&update->old_oid) || update->old_target);
      }
     +
    -+char *ref_transaction_error_msg(enum ref_transaction_error err)
    ++const char *ref_transaction_error_msg(enum ref_transaction_error err)
     +{
    -+	const char *reason = "";
    -+
     +	switch (err) {
     +	case REF_TRANSACTION_ERROR_NAME_CONFLICT:
    -+		reason = "refname conflict";
    -+		break;
    ++		return "refname conflict";
     +	case REF_TRANSACTION_ERROR_CREATE_EXISTS:
    -+		reason = "reference already exists";
    -+		break;
    ++		return "reference already exists";
     +	case REF_TRANSACTION_ERROR_NONEXISTENT_REF:
    -+		reason = "reference does not exist";
    -+		break;
    ++		return "reference does not exist";
     +	case REF_TRANSACTION_ERROR_INCORRECT_OLD_VALUE:
    -+		reason = "incorrect old value provided";
    -+		break;
    ++		return "incorrect old value provided";
     +	case REF_TRANSACTION_ERROR_INVALID_NEW_VALUE:
    -+		reason = "invalid new value provided";
    -+		break;
    ++		return "invalid new value provided";
     +	case REF_TRANSACTION_ERROR_EXPECTED_SYMREF:
    -+		reason = "expected symref but found regular ref";
    -+		break;
    ++		return "expected symref but found regular ref";
     +	default:
    -+		reason = "unkown failure";
    ++		return "unknown failure";
     +	}
    -+
    -+	return xstrdup(reason);
     +}
     
      ## refs.h ##
    @@ refs.h: void ref_transaction_for_each_rejected_update(struct ref_transaction *tr
     +/*
     + * Translate errors to human readable error messages.
     + */
    -+char *ref_transaction_error_msg(enum ref_transaction_error err);
    ++const char *ref_transaction_error_msg(enum ref_transaction_error err);
     +
      /*
       * Free `*transaction` and all associated data.
2:  c2014b959f ! 2:  dadabf1918 fetch: use batched reference updates
    @@ builtin/fetch.c: static int set_head(const struct ref *remote_refs, struct remot
     +					      enum ref_transaction_error err,
     +					      void *cb_data)
     +{
    -+	struct ref_rejection_data *data = (struct ref_rejection_data *)cb_data;
    ++	struct ref_rejection_data *data = cb_data;
     +
     +	if (err == REF_TRANSACTION_ERROR_NAME_CONFLICT && !data->conflict_msg_shown) {
     +		error(_("some local refs could not be updated; try running\n"
    @@ builtin/fetch.c: static int set_head(const struct ref *remote_refs, struct remot
     +			"branches"), data->remote_name);
     +		data->conflict_msg_shown = 1;
     +	} else {
    -+		char *reason = ref_transaction_error_msg(err);
    ++		const char *reason = ref_transaction_error_msg(err);
     +
     +		error(_("fetching ref %s failed: %s"), refname, reason);
    -+		free(reason);
     +	}
     +
     +	*data->retcode = 1;
    @@ builtin/fetch.c: static int do_fetch(struct transport *transport,
     +	 *
     +	 * TODO: if reference transactions gain logical conflict resolution, we
     +	 * can delete and create refs (with F/D conflicts) in the same transaction
    -+	 * and this can be moved about the 'prune_refs()' block.
    ++	 * and this can be moved above the 'prune_refs()' block.
     +	 */
     +	if (!transaction) {
     +		transaction = ref_store_transaction_begin(get_main_ref_store(the_repository),
3:  43f0dd3c31 = 3:  4006905807 send-pack: fix memory leak around duplicate refs
4:  a976afa025 ! 4:  0549b78b48 receive-pack: use batched reference updates
    @@ Commit message
         batch updates use a transaction under the hood. This explains the change
         in 't1416'.
     
    +    Helped-by: Jeff King <peff@xxxxxxxx>
         Helped-by: Patrick Steinhardt <ps@xxxxxx>
         Signed-off-by: Karthik Nayak <karthik.188@xxxxxxxxx>
     
    @@ builtin/receive-pack.c: static void BUG_if_skipped_connectivity_check(struct com
     +{
     +	struct strmap *failed_refs = cb_data;
     +
    -+	strmap_put(failed_refs, refname, ref_transaction_error_msg(err));
    ++	strmap_put(failed_refs, refname, (char *)ref_transaction_error_msg(err));
     +}
     +
      static void execute_commands_non_atomic(struct command *commands,
    @@ builtin/receive-pack.c: static void BUG_if_skipped_connectivity_check(struct com
     +		if (reported_error)
     +			cmd->error_string = reported_error;
     +		else if (strmap_contains(&failed_refs, cmd->ref_name))
    -+			cmd->error_string = xstrdup(strmap_get(&failed_refs, cmd->ref_name));
    ++			cmd->error_string = strmap_get(&failed_refs, cmd->ref_name);
      	}
     +
     +cleanup:
     +	ref_transaction_free(transaction);
    -+	strmap_clear(&failed_refs, 1);
    ++	strmap_clear(&failed_refs, 0);
      	strbuf_release(&err);
      }
      
    @@ t/t5408-send-pack-stdin.sh: test_expect_success 'stdin mixed with cmdline' '
      	echo A:foo >input &&
      	test_must_fail git send-pack remote.git --stdin B:foo <input &&
     -	verify_push B foo
    ++	test_grep "multiple updates for ref ${SQ}refs/heads/foo${SQ} not allowed" err &&
     +	test_must_fail git --git-dir=remote.git rev-parse foo
      '
      


base-commit: 7a1d2bd0a596f42a8a7a68d55577967bb454fec0
change-id: 20250423-501-update-git-fetch-1-to-use-partial-transactions-7fb66339fe79

Thanks
- Karthik





[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux