[PATCH v2 0/4] fetch/receive: use batched reference updates

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



The git-fetch(1) and git-receive-pack(1) commands update references as
part of the flow. Each reference update is treated as a single entity
and a transaction is created for each update.

This can be really slow, specifically in reference backends where there
are optimizations which ensure a single transaction with 'N' reference
update perform much faster than 'N' individual transactions. Also having
'N' individual transactions has buildup and teardown costs. These costs
add up in repositories with a large number of references.

Also specifically in the reftable backend, 'N' individual transactions
would also trigger auto-compaction for every transaction.

The reasoning for using individual transactions is because we want to
allow partial updates of references in these commands. Using a single
transaction would be an all-or-nothing scenario.

Recently we introduced an in-between solution called batched reference
updates in 23fc8e4f61 (refs: implement batch reference update support,
2025-04-08). This allows us to batch a set of reference updates, where
individual updates can pass/fail without affecting the batch.

This patch series, modifies both 'git-fetch(1)' and
'git-receive-pack(1)' to use this mechanism. With this, we see a
significant performance boost:

+---------------------+---------------+------------------+
|                     | files backend | reftable backend |
+---------------------+---------------+------------------+
| git-fetch(1)        | 1.25x         | 22x              |
| git-receive-pack(1) | 1.21x         | 18x              |
+---------------------+---------------+------------------+

The first and third patch handle the changes for 'git-fetch(1)' and
'git-receive-pack(1)' respectively. The second patch fixes a small
memory leak I encountered while working on this series.

This is based on top of master: 7a1d2bd0a5 (Merge branch 'master' of
https://github.com/j6t/gitk, 2025-05-09). There were no conflicts
observed with next or seen.

Junio, since the release window for 2.50 is closing in. I would prefer
we mark this for 2.51, so perhaps when/if we merge it to 'next', we let
it bake there till the next release window opens. While all the tests
pass, any bugs here would be high impact and it would be nice to catch
it before it hits a release.

---
Changes in v2:
- Added a new commit to introduce a mapping from 'ref_transaction_error'
  to a human readable string. We now use this to map the errors
  observed.
- Add a TODO regarding handling pruning and creation of refs in the same
  transaction in 'git-fetch(1)'.
- Cleanup commit message typos.
- Fix logic in 'builtin/receive-pack.c' around error reporting to be
  cleaner. Thanks Patrick for the suggestion.
- Link to v1: https://lore.kernel.org/r/20250514-501-update-git-fetch-1-to-use-partial-transactions-v1-0-7c65f46493d4@xxxxxxxxx

---
 builtin/fetch.c                  | 128 ++++++++++++++++++++++-----------------
 builtin/receive-pack.c           |  64 +++++++++++++++-----
 builtin/update-ref.c             |  26 +-------
 refs.c                           |  30 +++++++++
 refs.h                           |   5 ++
 send-pack.c                      |   7 +++
 t/t1416-ref-transaction-hooks.sh |   2 -
 t/t5408-send-pack-stdin.sh       |  14 ++++-
 8 files changed, 177 insertions(+), 99 deletions(-)

Karthik Nayak (4):
      refs: add function to translate errors to strings
      fetch: use batched reference updates
      send-pack: fix memory leak around duplicate refs
      receive-pack: use batched reference updates

Range-diff versus v1:

-:  ---------- > 1:  8c2b3d08d0 refs: add function to translate errors to strings
1:  b6a257eeff ! 2:  1683b19f10 fetch: use batched reference updates
    @@ builtin/fetch.c: static int s_update_ref(const char *action,
     -		ret = STORE_REF_ERROR_OTHER;
     -		goto out;
     -	}
    - 
    +-
     -	if (our_transaction) {
     -		switch (ref_transaction_commit(our_transaction, &err)) {
     -		case 0:
    @@ builtin/fetch.c: static int s_update_ref(const char *action,
     -			goto out;
     -		}
     -	}
    --
    + 
     -out:
     -	ref_transaction_free(our_transaction);
      	if (ret)
    @@ builtin/fetch.c: static int set_head(const struct ref *remote_refs, struct remot
     +	const char *remote_name;
     +};
     +
    -+static void ref_transaction_rejection_handler(const char *refname UNUSED,
    ++static void ref_transaction_rejection_handler(const char *refname,
     +					      const struct object_id *old_oid UNUSED,
     +					      const struct object_id *new_oid UNUSED,
     +					      const char *old_target UNUSED,
    @@ builtin/fetch.c: static int set_head(const struct ref *remote_refs, struct remot
     +			" 'git remote prune %s' to remove any old, conflicting "
     +			"branches"), data->remote_name);
     +		data->conflict_msg_shown = 1;
    ++	} else {
    ++		char *reason = ref_transaction_error_msg(err);
    ++
    ++		error(_("fetching ref %s failed: %s"), refname, reason);
    ++		free(reason);
     +	}
     +
     +	*data->retcode = 1;
    @@ builtin/fetch.c: static int do_fetch(struct transport *transport,
      
     +	/*
     +	 * If not atomic, we can still use batched updates, which would be much
    -+	 * more performent. We don't initiate the transaction before pruning,
    ++	 * more performant. We don't initiate the transaction before pruning,
     +	 * since pruning must be an independent step, to avoid F/D conflicts.
    ++	 *
    ++	 * TODO: if reference transactions gain logical conflict resolution, we
    ++	 * can delete and create refs (with F/D conflicts) in the same transaction
    ++	 * and this can be moved about the 'prune_refs()' block.
     +	 */
     +	if (!transaction) {
     +		transaction = ref_store_transaction_begin(get_main_ref_store(the_repository),
    @@ builtin/fetch.c: static int do_fetch(struct transport *transport,
     -			goto cleanup;
     +	if (retcode)
     +		goto cleanup;
    -+
    + 
    +-		retcode = ref_transaction_commit(transaction, &err);
     +	retcode = ref_transaction_commit(transaction, &err);
     +	if (retcode) {
     +		/*
    @@ builtin/fetch.c: static int do_fetch(struct transport *transport,
     +		transaction = NULL;
     +		goto cleanup;
     +	}
    - 
    --		retcode = ref_transaction_commit(transaction, &err);
    ++
     +	if (!atomic_fetch) {
     +		struct ref_rejection_data data = {
     +			.retcode = &retcode,
2:  1697d15527 ! 3:  85f68863d8 send-pack: fix memory leak around duplicate refs
    @@ Commit message
     
         If a reference fails to be updated, its error message is captured in the
         `ref->remote_status` field. While the command allows duplicate ref
    -    inputs, the list of doesn't accommodate this behavior as a particular
    +    inputs, the list doesn't accommodate this behavior as a particular
         refname is linked to a single `struct ref*` element. So if the user
         inputs a reference twice like:
     
3:  0579a66c98 ! 4:  1cd2324045 receive-pack: use batched reference updates
    @@ Commit message
         batch updates use a transaction under the hood. This explains the change
         in 't1416'.
     
    +    Helped-by: Patrick Steinhardt <ps@xxxxxx>
         Signed-off-by: Karthik Nayak <karthik.188@xxxxxxxxx>
     
      ## builtin/receive-pack.c ##
    @@ builtin/receive-pack.c: static void BUG_if_skipped_connectivity_check(struct com
     +					      enum ref_transaction_error err,
     +					      void *cb_data)
     +{
    -+	struct strmap *failed_refs = (struct strmap *)cb_data;
    -+	const char *reason = "";
    ++	struct strmap *failed_refs = cb_data;
     +
    -+	switch (err) {
    -+	case REF_TRANSACTION_ERROR_NAME_CONFLICT:
    -+		reason = "refname conflict";
    -+		break;
    -+	case REF_TRANSACTION_ERROR_CREATE_EXISTS:
    -+		reason = "reference already exists";
    -+		break;
    -+	case REF_TRANSACTION_ERROR_NONEXISTENT_REF:
    -+		reason = "reference does not exist";
    -+		break;
    -+	case REF_TRANSACTION_ERROR_INCORRECT_OLD_VALUE:
    -+		reason = "incorrect old value provided";
    -+		break;
    -+	case REF_TRANSACTION_ERROR_INVALID_NEW_VALUE:
    -+		reason = "invalid new value provided";
    -+		break;
    -+	case REF_TRANSACTION_ERROR_EXPECTED_SYMREF:
    -+		reason = "expected symref but found regular ref";
    -+		break;
    -+	default:
    -+		reason = "unkown failure";
    -+	}
    -+
    -+	strmap_put(failed_refs, refname, xstrdup(reason));
    ++	strmap_put(failed_refs, refname, ref_transaction_error_msg(err));
     +}
     +
      static void execute_commands_non_atomic(struct command *commands,
    @@ builtin/receive-pack.c: static void BUG_if_skipped_connectivity_check(struct com
      {
      	struct command *cmd;
      	struct strbuf err = STRBUF_INIT;
    -+	const char *reported_error = "";
    ++	const char *reported_error = NULL;
     +	struct strmap failed_refs = STRMAP_INIT;
     +
     +	transaction = ref_store_transaction_begin(get_main_ref_store(the_repository),
    @@ builtin/receive-pack.c: static void BUG_if_skipped_connectivity_check(struct com
     +		rp_error("%s", err.buf);
     +		reported_error = "failed to update refs";
     +		goto failure;
    - 	}
    ++	}
     +
     +	ref_transaction_for_each_rejected_update(transaction,
     +						 ref_transaction_rejection_handler,
    @@ builtin/receive-pack.c: static void BUG_if_skipped_connectivity_check(struct com
     +
     +failure:
     +	for (cmd = commands; cmd; cmd = cmd->next) {
    -+		if (strmap_empty(&failed_refs))
    ++		if (reported_error)
     +			cmd->error_string = reported_error;
     +		else if (strmap_contains(&failed_refs, cmd->ref_name))
     +			cmd->error_string = xstrdup(strmap_get(&failed_refs, cmd->ref_name));
    -+	}
    + 	}
     +
     +cleanup:
     +	ref_transaction_free(transaction);


base-commit: 7a1d2bd0a596f42a8a7a68d55577967bb454fec0
change-id: 20250423-501-update-git-fetch-1-to-use-partial-transactions-7fb66339fe79

Thanks
- Karthik





[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux