Re: [PATCH 7/8] refs: stop unsetting REF_HAVE_OLD for log-only updates

Patrick Steinhardt <ps@xxxxxx> · Thu, 24 Jul 2025 09:42:48 +0200

On Wed, Jul 23, 2025 at 03:31:01PM -0500, Justin Tobler wrote:
> On 25/07/22 01:20PM, Patrick Steinhardt wrote:
> > The `REF_HAVE_OLD` flag indicates whether a given ref update has its old
> > object ID set. If so, the value of that field is used to verify whteher
> 
> s/whteher/whether/
> 
> > the current state of the reference matches this expected state. It is
> > thus an important part of mitigating races with a concurrent process
> > that updates the same set of references.
> > 
> > When writing reflogs though we explicitly unset that flag. This is a
> > sensible thing to do: the old state of reflog entry updates may not
> > necessarily match the current on-disk state of its accompanying ref, but
> > it's only intended to signal what old object ID we want to write into
> > the new reflog entry. For example when migrating refs we end up writing
> > many reflog entries for a single reference, and most likely those reflog
> > entries will have many different old object IDs.
> > 
> > But unsetting this flag also removes a useful signal, namely that the
> > caller _did_ provide an old object ID for a given reflog entry. This
> > signal is useful to determine whether we have to resolve the refname
> > manually to figure out the current state, or whether we should just go
> > with what the caller has provided.
> > 
> > This actually causes real issues when migrating reflogs, as we don't
> > know to actually use the caller-provided old object ID when writing
> > those entries. Instead, reflog entries simply end up with the all-zero
> > object ID.
> 
> Ok, if I'm understanding this correctly, the `REF_HAVE_OLD` flag is also
> required to actually record a provided old OID in the reflog entry. If it
> is not set, a NUL OID is recorded instead.

This fix is not sufficient to record the old object ID, so the commit
message is indeed misleading as it comes from a previous iteration of
this patch seires.

I want to have this signal so that in the next commit we can assert that
if the new `REF_LOG_USE_PROVIDED_OIDS` is set, that the caller also sets
both `REF_HAVE_OLD` and `REF_HAVE_NEW`. So this commit here isn't really
required anymore, but I think it's the right thing to do anyway, and the
additional safety check isn't too bad to have in my opinion.

I'll rewrite the commit message a bit.

> > Stop unsetting the flag so that we can use it as this described signal,
> > which we'll do in a subsequent commit. Skip checking the old object ID
> > for log-only updates so that we don't expect it to match the current
> > on-disk state.
> 
> Just to clarify, when migrating reflogs, are these operations always
> marked with `REF_LOG_ONLY`? The comment for that flag states:
> 
>   Used as a flag in ref_update::flags when we want to log a ref
>   update but not actually perform it.  This is used when a symbolic ref
>   update is split up.                                           
> 
> I might be misunderstanding this though.

The comment is out of date indeed. When queueing reflog updates we use
`ref_transaction_update_reflog()`, which sets the following flags:

  - REF_HAVE_OLD and REF_HAVE_NEW to indicate the object IDs.

  - REF_LOG_ONLY to skip writing the reference itself.

  - REF_FORCE_CREATE_REFLOG to disable heuristics whether or not the
    reflog entry should be written in the first place.

  - REF_NO_DEREF to not write the reflog for the target of a potential
    symref, but for the symref itself.

  - REF_LOG_USE_PROVIDED_OIDS to actually use the provided object IDs
    and not try to resolve state.

It's an awful lot of flags, and I tried to work on a solution where we
don't have to introduce the new flag. But the logic here is so intricate
that it always caused unintended side effects.

Thanks for your review!

Patrick