Re: [PATCH v3 7/9] refs/files: detect race when generating reflog entry for HEAD

Patrick Steinhardt <ps@xxxxxx> · Mon, 4 Aug 2025 09:38:29 +0200

On Sat, Aug 02, 2025 at 07:11:28AM -0400, Jeff King wrote:
> On Tue, Jul 29, 2025 at 10:55:25AM +0200, Patrick Steinhardt wrote:
> 
> > Unfortunately, this change only helps with the second race. We cannot
> > reliably plug the first race without locking the HEAD reference at the
> > start of the transaction. Locking HEAD unconditionally would effectively
> > serialize all writes though, and that doesn't seem like an option. Also,
> > double checking its value at the end of the transaction is not an option
> > either, as its target may have flip-flopped during the transaction.
> 
> I agree we should not always take a lock on HEAD, since most refs would
> not need it. But I wonder if we could do better by examining HEAD, then
> taking a lock when we think we'll need it, and then re-checking the
> value of HEAD. That is still racy, though (somebody could have pointed
> HEAD at us between the two checks). Fundamentally the files backend is
> not atomic across the whole namespace, and we are trying to update two
> refs. So I think there will always be some race.

We do that though. When queueing the log-only update for HEAD we don't
lock immediately, but we lock once we process that log-only update. And
that's where we now do the check whether HEAD has changed meanwhile,
which should be race-free given that it did change indeed.

> It does make me wonder if this race-fix is even worth it, then. We are
> catching the case where somebody moves HEAD away from the ref we are
> updating while we are updating it. But without atomicity, do we even
> know which happened first? That is, would it be incorrect to update
> HEAD anyway? I guess the outcome is observable because their movement of
> HEAD generated a reflog entry, and thus the entries would be out of
> order. So maybe that is worth it.

So with the above I think that we are race-free for the case where HEAD
has been modified after we have decided to write a reflog entry for it.
We aren't though in the case where we _haven't_ decided to write a
reflog entry, as HEAD might have been adjusted to point to one of the
updated refs meanwhile as you rightfully point out.

But I think that's an acceptable tradeoff. I'd rather write no reflog
entry than an incorrect one.

> Anyway, I had two questions about the code:
> 
> > @@ -2600,7 +2607,36 @@ static enum ref_transaction_error lock_ref_for_update(struct files_ref_store *re
> >  
> >  	update->backend_data = lock;
> >  
> > -	if (update->type & REF_ISSYMREF) {
> > +	if (update->flags & REF_LOG_VIA_SPLIT) {
> > +		struct ref_lock *parent_lock;
> > +
> > +		if (!update->parent_update)
> > +			BUG("split update without a parent");
> > +
> > +		parent_lock = update->parent_update->backend_data;
> > +
> > +		/*
> > +		 * Check that "HEAD" didn't racily change since we have looked
> > +		 * it up. If it did we must refuse to write the reflog entry.
> > +		 *
> > +		 * Note that this does not catch all races: if "HEAD" was
> > +		 * racily changed to point to one of the refs part of the
> > +		 * transaction then we would miss writing the split reflog
> > +		 * entry for "HEAD".
> > +		 */
> > +		if (!(update->type & REF_ISSYMREF) ||
> > +		    strcmp(update->parent_update->refname, referent.buf)) {
> > +			strbuf_addstr(err, "HEAD has been racily updated");
> > +			ret = REF_TRANSACTION_ERROR_GENERIC;
> > +			goto out;
> > +		}
> 
> One, what happens with a multi-level ref (e.g., HEAD points to
> refs/heads/foo which points to refs/heads/bar)?
> 
> We've resolved HEAD to get referent.buf. Do we get "foo" or "bar" here?
> If "bar", then a write through "foo" will complain. But if we get "foo",
> then theoretically a write through "bar" will complain.
> 
> I _think_ we are OK, though. Constructing it like this:
> 
>   git init
>   git commit --allow-empty -m whatever
> 
>   git symbolic-ref refs/heads/foo refs/heads/bar
>   git symbolic-ref HEAD refs/heads/foo
>   git update-ref refs/heads/foo main
> 
> triggers the check and shows that our referent from lock_raw_ref() is
> the first level (i.e., "foo"). Which is good.
> 
> If we swap out "foo" for "bar" in the update-ref call, then we'd get a
> mismatch. But in that case we do not figure out that HEAD needs be
> written at all! That is, we only do a single level of look-back to
> decide whether to write HEAD at all. So as long as we keep doing so, we
> are OK.

Yeah, the whole code is best-effort only and doesn't work for all kinds
of edge cases. The mere fact that we only handle this case for HEAD is
already a limitation -- in theory we should do it for all symbolic refs.

> > +		if (!(update->type & REF_ISSYMREF) ||
> > +		    strcmp(update->parent_update->refname, referent.buf)) {
> > +			strbuf_addstr(err, "HEAD has been racily updated");
> > +			ret = REF_TRANSACTION_ERROR_GENERIC;
> > +			goto out;
> > +		}
> 
> And two, is an error the right thing here? The user asked us to update
> "foo", and we saw that HEAD pointed to it. So we decided to update
> HEAD's reflog, too. And when it came time to do so under lock, we found
> that HEAD did not point to "foo" any more.
> 
> Shouldn't we quietly drop the HEAD reflog update, rather than forcing
> the whole transaction to fail? The user never asked us to update HEAD at
> all. It was something we opportunistically decided to do, and now we
> find out that it is not appropriate to do so.

That's something I wasn't quite sure about, either. Honestly, the reason
I shied away from it is that it needs a bit more munging for an edge
case that is hard to test reliably. But I guess we can do something like
the below patch to skip writing the reflog message instea.

Patrick

diff --git a/refs/files-backend.c b/refs/files-backend.c
index 905555365b..851b1b33f4 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -643,14 +643,16 @@ int parse_loose_ref_contents(const struct git_hash_algo *algop,
 	return 0;
 }
 
-static void unlock_ref(struct ref_lock *lock)
+static int unlock_ref(struct ref_lock *lock)
 {
 	lock->count--;
 	if (!lock->count) {
 		rollback_lock_file(&lock->lk);
 		free(lock->ref_name);
 		free(lock);
+		return 1;
 	}
+	return 0;
 }
 
 /*
@@ -2557,6 +2559,9 @@ struct files_transaction_backend_data {
  *   the referent to transaction.
  * - If it is an update of head_ref, add a corresponding REF_LOG_ONLY
  *   update of HEAD.
+ *
+ * Returns 0 on success, 1 in case the update needs to be dropped or a negative
+ * error code otherwise.
  */
 static enum ref_transaction_error lock_ref_for_update(struct files_ref_store *refs,
 						      struct ref_update *update,
@@ -2617,7 +2622,8 @@ static enum ref_transaction_error lock_ref_for_update(struct files_ref_store *re
 
 		/*
 		 * Check that "HEAD" didn't racily change since we have looked
-		 * it up. If it did we must refuse to write the reflog entry.
+		 * it up. If it did we remove the reflog-only updateg from the
+		 * transaction again.
 		 *
 		 * Note that this does not catch all races: if "HEAD" was
 		 * racily changed to point to one of the refs part of the
@@ -2626,8 +2632,16 @@ static enum ref_transaction_error lock_ref_for_update(struct files_ref_store *re
 		 */
 		if (!(update->type & REF_ISSYMREF) ||
 		    strcmp(update->parent_update->refname, referent.buf)) {
-			strbuf_addstr(err, "HEAD has been racily updated");
-			ret = REF_TRANSACTION_ERROR_GENERIC;
+			if (unlock_ref(lock))
+				strmap_remove(&backend_data->ref_locks,
+					      update->refname, 0);
+
+			memmove(transaction->updates + update_idx,
+				transaction->updates + update_idx + 1,
+				(transaction->nr - update_idx - 1) * sizeof(*transaction->updates));
+			transaction->nr--;
+
+			ret = 1;
 			goto out;
 		}
 
@@ -2896,6 +2910,8 @@ static int files_transaction_prepare(struct ref_store *ref_store,
 					  head_ref, &refnames_to_check,
 					  err);
 		if (ret) {
+			if (ret > 0)
+				continue;
 			if (ref_transaction_maybe_set_rejected(transaction, i, ret)) {
 				strbuf_reset(err);
 				ret = 0;