Re: [PATCH v2 7/8] refs: stop unsetting REF_HAVE_OLD for log-only updates

Patrick Steinhardt <ps@xxxxxx> · Mon, 28 Jul 2025 16:43:12 +0200

On Fri, Jul 25, 2025 at 07:36:10AM -0400, Jeff King wrote:
> On Fri, Jul 25, 2025 at 08:58:29AM +0200, Patrick Steinhardt wrote:
> 
> > The `REF_HAVE_OLD` flag indicates whether a given ref update has its old
> > object ID set. If so, the value of that field is used to verify whether
> > the current state of the reference matches this expected state. It is
> > thus an important part of mitigating races with a concurrent process
> > that updates the same set of references.
> > 
> > When writing reflogs though we explicitly unset that flag. This is a
> > sensible thing to do: the old state of reflog entry updates may not
> > necessarily match the current on-disk state of its accompanying ref, but
> > it's only intended to signal what old object ID we want to write into
> > the new reflog entry. For example when migrating refs we end up writing
> > many reflog entries for a single reference, and most likely those reflog
> > entries will have many different old object IDs.
> > 
> > But unsetting this flag also removes a useful signal, namely that the
> > caller _did_ provide an old object ID for a given reflog entry. This
> > signal will become useful in a subsequent commit, where we add a new
> > flag that tells the transaction to use the provided old and new object
> > IDs to write a reflog entry. The `REF_HAVE_OLD` flag is then used as a
> > signal to verify that the caller really did provide an old object ID.
> > 
> > Stop unsetting the flag so that we can use it as this described signal
> > in a subsequent commit. Skip checking the old object ID for log-only
> > updates so that we don't expect it to match the current on-disk state.
> 
> I like this direction, but I happened to be working in this area
> yesterday[1] and noticed something interesting. You're effectively
> replacing this removal of the HAVE_OLD flag when split a symref update:
> 
> > diff --git a/refs/files-backend.c b/refs/files-backend.c
> > index bf6f89b1d19..8b42fe18901 100644
> > --- a/refs/files-backend.c
> > +++ b/refs/files-backend.c
> > @@ -2493,7 +2493,6 @@ static enum ref_transaction_error split_symref_update(struct ref_update *update,
> >  	 * done when new_update is processed.
> >  	 */
> >  	update->flags |= REF_LOG_ONLY | REF_NO_DEREF;
> > -	update->flags &= ~REF_HAVE_OLD;
> >  
> >  	return 0;
> >  }
> 
> and then later we get the same logic by checking for LOG_ONLY:
> 
> > @@ -2508,8 +2507,9 @@ static enum ref_transaction_error check_old_oid(struct ref_update *update,
> >  						struct object_id *oid,
> >  						struct strbuf *err)
> >  {
> > -	if (!(update->flags & REF_HAVE_OLD) ||
> > -		   oideq(oid, &update->old_oid))
> > +	if (update->flags & REF_LOG_ONLY ||
> > +	    !(update->flags & REF_HAVE_OLD) ||
> > +	    oideq(oid, &update->old_oid))
> >  		return 0;
> >  
> >  	if (is_null_oid(&update->old_oid)) {
> 
> Which make sense to me. But the weird thing I noticed is that when we do
> something similar for split_head_update(), we don't strip REF_HAVE_OLD!

And we shouldn't do that, as in the next commit we actually build on
always having `REF_HAVE_OLD` set for reflog-only updates. So I'd argue
that the problem is actually the other way round: when splitting off the
HEAD update we must resolve the old object ID if `REF_HAVE_OLD` is not
set.

> (For those not familiar with that