[replacement PATCH 4/5] describe: handle blob traversal with no commits

Jeff King <peff@xxxxxxxx> · Wed, 20 Aug 2025 02:30:34 -0400

On Wed, Aug 20, 2025 at 06:34:16AM +0200, Patrick Steinhardt wrote:

> > Ah, no. We did that in b229d18a80 (validate_headref: tighten
> > ref-matching to just branches, 2009-01-29), but had to revert it in
> > e9cc02f0e4 (symbolic-ref: allow refs/<whatever> in HEAD, 2009-02-13) to
> > keep compatibility for topgit. :(
> 
> Well, that's certainly from before my time in the Git project :) I guess
> changing semantics now would be quite risky. Reintroducing this change
> feels out of the picture, but an alternative one could think about is to
> validate that HEAD always points to a commit(-ish?).

Yeah, that's _probably_ OK. I don't remember how topgit works at all,
but I think its custom refs do at least point to commits.

> But ultimately I'm not sure it's even worth it. If people really want to
> shoot themselves into the foot they'll find a way to do so.

Yeah, agreed that it's probably not urgent.

> > Still, I'm not sure it's something I'd want to base a test on. Maybe if
> > there is a big comment that says "It is OK to invalidate and remove this
> > test if we ever tighten symbolic-ref" it would be OK?
> 
> That sounds reasonable to me, yeah.

OK. Here's a replacement for patch 4, then, with a test. Nothing else in
the series should need to be touched.

-- >8 --
Subject: [PATCH] describe: handle blob traversal with no commits

When describing a blob, we traverse from HEAD, remembering each commit
we saw, and then checking each blob to report the containing commit.
But if we haven't seen any commits at all, we'll segfault (we store the
"current" commit as an oid initialized to the null oid, causing
lookup_commit_reference() to return NULL).

This shouldn't be able to happen normally. We always start our traversal
at HEAD, which must be a commit (a property which is enforced by the
refs code). But you can trigger the segfault like this:

  blob=$(echo foo | git hash-object -w --stdin)
  echo $blob >.git/HEAD
  git describe $blob

We can instead catch this case and return an empty result, which hits
the usual "we didn't find $blob while traversing HEAD" error.

This is a minor lie in that we did "find" the blob. And this even hints
at a bigger problem in this code: what if the traversal pointed to the
blob as _not_ part of a commit at all, but we had previously filled in
the recorded "current commit"? One could imagine this happening due to a
tag pointing directly to the blob in question.

But that can't happen, because we only traverse from HEAD, never from
any other refs. And the intent of the blob-describing code is to find
blobs within commits.

So I think this matches the original intent as closely as we can (and
again, this segfault cannot be triggered without corrupting your
repository!).

The test here does not use the formula above, which works only for the
files backend (and not reftables). Instead we use another loophole to
create the bogus state using only Git commands. See the comment in the
test for details.

Signed-off-by: Jeff King <peff@xxxxxxxx>
---
 builtin/describe.c  |  6 ++++--
 t/t6120-describe.sh | 16 ++++++++++++++++
 2 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/builtin/describe.c b/builtin/describe.c
index f7bea3c8c5..72b2e1162c 100644
--- a/builtin/describe.c
+++ b/builtin/describe.c
@@ -507,8 +507,10 @@ static void process_object(struct object *obj, const char *path, void *data)
 
 	if (oideq(pcd->looking_for, &obj->oid) && !pcd->dst->len) {
 		reset_revision_walk();
-		describe_commit(&pcd->current_commit, pcd->dst);
-		strbuf_addf(pcd->dst, ":%s", path);
+		if (!is_null_oid(&pcd->current_commit)) {
+			describe_commit(&pcd->current_commit, pcd->dst);
+			strbuf_addf(pcd->dst, ":%s", path);
+		}
 		free_commit_list(pcd->revs->commits);
 		pcd->revs->commits = NULL;
 	}
diff --git a/t/t6120-describe.sh b/t/t6120-describe.sh
index feec57bcbc..2c70cc561a 100755
--- a/t/t6120-describe.sh
+++ b/t/t6120-describe.sh
@@ -423,6 +423,22 @@ test_expect_success 'describe blob on an unborn branch' '
 	test_grep "cannot search .* on an unborn branch" actual
 '
 
+# This test creates a repository state that we generally try to disallow: HEAD
+# is pointing to an object that is not a commit. The ref update code forbids
+# non-commit writes directly to HEAD or to any branch in refs/heads/.  But we
+# can use the loophole of pointing HEAD to another non-branch ref (something we
+# should forbid, but don't for historical reasons).
+#
+# Do not take this test as an endorsement of the loophole! If we ever tighten
+# it, it is reasonable to just drop this test entirely.
+test_expect_success 'describe blob on a non-commit HEAD' '
+	oldbranch=$(git symbolic-ref HEAD) &&
+	test_when_finished "git symbolic-ref HEAD $oldbranch" &&
+	git symbolic-ref HEAD refs/tags/test-blob &&
+	test_must_fail git describe test-blob 2>actual &&
+	test_grep "blob .* not reachable from HEAD" actual
+'
+
 test_expect_success ULIMIT_STACK_SIZE 'name-rev works in a deep repo' '
 	i=1 &&
 	while test $i -lt 8000
-- 
2.51.0.326.gecbb38d78e