Re: Potential Null Pointer Dereference detected by static analysis tool

René Scharfe <l.s.r@xxxxxx> · Mon, 18 Aug 2025 21:56:35 +0200

On 8/18/25 7:05 AM, Jeff King wrote:
> On Mon, Aug 18, 2025 at 12:48:07AM -0400, Jeff King wrote:
> 
>> And indeed, the current code does then segfault on "git describe foo" at
>> the spot I mentioned. Even though the repository state here is
>> unexpected and corrupt, I do think we should probably be more defensive
>> and avoid the segfault.
> 
> So you almost nerd-sniped me into making a series. But the more I dug
> into the rabbit-hole, the more I turned away in disgust. :)
> 
> The set of problems I found are:
> 
>   1. What should happen when traversing from HEAD does not find the blob
>      in question? Right now we print a blank line, which is...weird.

Weird indeed.

>      Probably we should either print nothing, or return an error.

The latter, consistent with "git describe <commit-ish>".

> If we return an error, should we respect --always?

The documentation says "git describe <blob>" takes no options.  It could
learn some, of course.  But does it have to?  Perhaps better keep that
thing contained.

> Are we stuck with the
>      current dumb behavior because it's a plumbing command?

git describe is a porcelain command.

The documentation doesn't say what happens when the blob is not found
in HEAD's ancestry.

I can't imagine why someone would use "git describe <blob>" to check
whether a particular blob is linked to, but it _is_ slightly faster than
"git rev-list --objects --no-object-names HEAD | grep <blob>" for me,
and of course easier to type.  "git log --find-object <blob>" is slower
than either.

>   2. When we are on an unborn branch, we print a confusing message:
> 
>        $ git init
>        $ git commit --allow-empty -m foo
>        $ git tag foo
>        $ git symbolic-ref HEAD refs/heads/unborn
>        $ git describe $(echo blob | git hash-object -w --stdin)
>        fatal: ambiguous argument 'HEAD': unknown revision or path not in the working tree.
>        Use '--' to separate paths from revisions, like this:
>        'git <command> [<revision>...] -- [<file>...]'
> 
>      We should probably resolve HEAD ourselves and either bail with an
>      empty output or an error (depending on what we do for (1) above).

It already is an error, just needs a better message.  It should still
report an error even if we were to stick with showing blank lines for
unrelated blobs.

>   3. When we do traverse, if process_object() sees that we didn't find a
>      commit, we should detect that and either return an empty result or
>      an error (again, depending on the behavior of (2) above). This is
>      done by checking is_null_oid(&pcd->current_commit) there.

OK, ending the search right there might be the best option.  Traversing
deeper into the forest that we then know to be cursed would be the
unappealing alternative.

>   4. Then we can teach describe_commit() to take a commit rather than an
>      oid (and the is_null_oid() check becomes a NULL check).

  5. When process_object() has a commit, but it is indescribable, it
     shows an error:

     $ git describe 5afbe6da1d6ab0b8939343166636b828e561bf35
     fatal: No tags can describe '3b681e255cd8577a77983958ef7f566b05806cd0'.
     Try --always, or create some tags.

     It's not immediately clear that the reported hash belongs to the
     found commit.  And that suggestion to try --always is misleading,
     as "git describe <blob>" takes no options according to the
     documentation.  I'm not sure I like it in general -- can't tell
     if the command is being snarky with me.

> So it all depends on what to do with (1), and for a feature that IMHO
> should not even exist in the first place, I had trouble summoning the
> will-power to make this 4-patch series.

644eb60bd0 (builtin/describe.c: describe a blob, 2017-11-15) and
15af58c1ad (diffcore: add a pickaxe option to find a specific blob,
2018-01-04) confuse me; the latter's commit message sounds like the
former wasn't (supposed to be?) merged.

I think the issues you listed are independent, though.  Or what's wrong
with this demo that addresses point 3 in process_object() and 1 in
describe_blob().  If we want a blank line for 1 then we apply only
the first hunk.  Or am I missing something?

René

diff --git a/builtin/describe.c b/builtin/describe.c
index d7dd8139de..9e485240aa 100644
--- a/builtin/describe.c
+++ b/builtin/describe.c
@@ -507,8 +507,10 @@ static void process_object(struct object *obj, const char *path, void *data)
 
 	if (oideq(&pcd->looking_for, &obj->oid) && !pcd->dst->len) {
 		reset_revision_walk();
-		describe_commit(&pcd->current_commit, pcd->dst);
-		strbuf_addf(pcd->dst, ":%s", path);
+		if (!is_null_oid(&pcd->current_commit)) {
+			describe_commit(&pcd->current_commit, pcd->dst);
+			strbuf_addf(pcd->dst, ":%s", path);
+		}
 		free_commit_list(pcd->revs->commits);
 		pcd->revs->commits = NULL;
 	}
@@ -519,6 +521,7 @@ static void describe_blob(struct object_id oid, struct strbuf *dst)
 	struct rev_info revs;
 	struct strvec args = STRVEC_INIT;
 	struct process_commit_data pcd = { *null_oid(the_hash_algo), oid, dst, &revs};
+	size_t orig_len = dst->len;
 
 	strvec_pushl(&args, "internal: The first arg is not parsed",
 		     "--objects", "--in-commit-order", "--reverse", "HEAD",
@@ -532,6 +535,8 @@ static void describe_blob(struct object_id oid, struct strbuf *dst)
 		die("revision walk setup failed");
 
 	traverse_commit_list(&revs, process_commit, process_object, &pcd);
+	if (dst->len == orig_len)
+		die(_("unable to describe blob '%s'"), oid_to_hex(&oid));
 	reset_revision_walk();
 	release_revisions(&revs);
 	strvec_clear(&args);