On Sat, Aug 02, 2025 at 07:11:28AM -0400, Jeff King wrote: > On Tue, Jul 29, 2025 at 10:55:25AM +0200, Patrick Steinhardt wrote: > > > Unfortunately, this change only helps with the second race. We cannot > > reliably plug the first race without locking the HEAD reference at the > > start of the transaction. Locking HEAD unconditionally would effectively > > serialize all writes though, and that doesn't seem like an option. Also, > > double checking its value at the end of the transaction is not an option > > either, as its target may have flip-flopped during the transaction. > > I agree we should not always take a lock on HEAD, since most refs would > not need it. But I wonder if we could do better by examining HEAD, then > taking a lock when we think we'll need it, and then re-checking the > value of HEAD. That is still racy, though (somebody could have pointed > HEAD at us between the two checks). Fundamentally the files backend is > not atomic across the whole namespace, and we are trying to update two > refs. So I think there will always be some race. We do that though. When queueing the log-only update for HEAD we don't lock immediately, but we lock once we process that log-only update. And that's where we now do the check whether HEAD has changed meanwhile, which should be race-free given that it did change indeed. > It does make me wonder if this race-fix is even worth it, then. We are > catching the case where somebody moves HEAD away from the ref we are > updating while we are updating it. But without atomicity, do we even > know which happened first? That is, would it be incorrect to update > HEAD anyway? I guess the outcome is observable because their movement of > HEAD generated a reflog entry, and thus the entries would be out of > order. So maybe that is worth it. So with the above I think that we are race-free for the case where HEAD has been modified after we have decided to write a reflog entry for it. We aren't though in the case where we _haven't_ decided to write a reflog entry, as HEAD might have been adjusted to point to one of the updated refs meanwhile as you rightfully point out. But I think that's an acceptable tradeoff. I'd rather write no reflog entry than an incorrect one. > Anyway, I had two questions about the code: > > > @@ -2600,7 +2607,36 @@ static enum ref_transaction_error lock_ref_for_update(struct files_ref_store *re > > > > update->backend_data = lock; > > > > - if (update->type & REF_ISSYMREF) { > > + if (update->flags & REF_LOG_VIA_SPLIT) { > > + struct ref_lock *parent_lock; > > + > > + if (!update->parent_update) > > + BUG("split update without a parent"); > > + > > + parent_lock = update->parent_update->backend_data; > > + > > + /* > > + * Check that "HEAD" didn't racily change since we have looked > > + * it up. If it did we must refuse to write the reflog entry. > > + * > > + * Note that this does not catch all races: if "HEAD" was > > + * racily changed to point to one of the refs part of the > > + * transaction then we would miss writing the split reflog > > + * entry for "HEAD". > > + */ > > + if (!(update->type & REF_ISSYMREF) || > > + strcmp(update->parent_update->refname, referent.buf)) { > > + strbuf_addstr(err, "HEAD has been racily updated"); > > + ret = REF_TRANSACTION_ERROR_GENERIC; > > + goto out; > > + } > > One, what happens with a multi-level ref (e.g., HEAD points to > refs/heads/foo which points to refs/heads/bar)? > > We've resolved HEAD to get referent.buf. Do we get "foo" or "bar" here? > If "bar", then a write through "foo" will complain. But if we get "foo", > then theoretically a write through "bar" will complain. > > I _think_ we are OK, though. Constructing it like this: > > git init > git commit --allow-empty -m whatever > > git symbolic-ref refs/heads/foo refs/heads/bar > git symbolic-ref HEAD refs/heads/foo > git update-ref refs/heads/foo main > > triggers the check and shows that our referent from lock_raw_ref() is > the first level (i.e., "foo"). Which is good. > > If we swap out "foo" for "bar" in the update-ref call, then we'd get a > mismatch. But in that case we do not figure out that HEAD needs be > written at all! That is, we only do a single level of look-back to > decide whether to write HEAD at all. So as long as we keep doing so, we > are OK. Yeah, the whole code is best-effort only and doesn't work for all kinds of edge cases. The mere fact that we only handle this case for HEAD is already a limitation -- in theory we should do it for all symbolic refs. > > + if (!(update->type & REF_ISSYMREF) || > > + strcmp(update->parent_update->refname, referent.buf)) { > > + strbuf_addstr(err, "HEAD has been racily updated"); > > + ret = REF_TRANSACTION_ERROR_GENERIC; > > + goto out; > > + } > > And two, is an error the right thing here? The user asked us to update > "foo", and we saw that HEAD pointed to it. So we decided to update > HEAD's reflog, too. And when it came time to do so under lock, we found > that HEAD did not point to "foo" any more. > > Shouldn't we quietly drop the HEAD reflog update, rather than forcing > the whole transaction to fail? The user never asked us to update HEAD at > all. It was something we opportunistically decided to do, and now we > find out that it is not appropriate to do so. That's something I wasn't quite sure about, either. Honestly, the reason I shied away from it is that it needs a bit more munging for an edge case that is hard to test reliably. But I guess we can do something like the below patch to skip writing the reflog message instea. Patrick diff --git a/refs/files-backend.c b/refs/files-backend.c index 905555365b..851b1b33f4 100644 --- a/refs/files-backend.c +++ b/refs/files-backend.c @@ -643,14 +643,16 @@ int parse_loose_ref_contents(const struct git_hash_algo *algop, return 0; } -static void unlock_ref(struct ref_lock *lock) +static int unlock_ref(struct ref_lock *lock) { lock->count--; if (!lock->count) { rollback_lock_file(&lock->lk); free(lock->ref_name); free(lock); + return 1; } + return 0; } /* @@ -2557,6 +2559,9 @@ struct files_transaction_backend_data { * the referent to transaction. * - If it is an update of head_ref, add a corresponding REF_LOG_ONLY * update of HEAD. + * + * Returns 0 on success, 1 in case the update needs to be dropped or a negative + * error code otherwise. */ static enum ref_transaction_error lock_ref_for_update(struct files_ref_store *refs, struct ref_update *update, @@ -2617,7 +2622,8 @@ static enum ref_transaction_error lock_ref_for_update(struct files_ref_store *re /* * Check that "HEAD" didn't racily change since we have looked - * it up. If it did we must refuse to write the reflog entry. + * it up. If it did we remove the reflog-only updateg from the + * transaction again. * * Note that this does not catch all races: if "HEAD" was * racily changed to point to one of the refs part of the @@ -2626,8 +2632,16 @@ static enum ref_transaction_error lock_ref_for_update(struct files_ref_store *re */ if (!(update->type & REF_ISSYMREF) || strcmp(update->parent_update->refname, referent.buf)) { - strbuf_addstr(err, "HEAD has been racily updated"); - ret = REF_TRANSACTION_ERROR_GENERIC; + if (unlock_ref(lock)) + strmap_remove(&backend_data->ref_locks, + update->refname, 0); + + memmove(transaction->updates + update_idx, + transaction->updates + update_idx + 1, + (transaction->nr - update_idx - 1) * sizeof(*transaction->updates)); + transaction->nr--; + + ret = 1; goto out; } @@ -2896,6 +2910,8 @@ static int files_transaction_prepare(struct ref_store *ref_store, head_ref, &refnames_to_check, err); if (ret) { + if (ret > 0) + continue; if (ref_transaction_maybe_set_rejected(transaction, i, ret)) { strbuf_reset(err); ret = 0;