On Fri, Apr 04, 2025 at 12:58:38PM +0200, Patrick Steinhardt wrote: > But in practice the check itself is misbehaving when handling unicode > characters. The particular issue triggered with a branch that got the > "shaved ice" unicode character in its name, which is composed of the > bytes "0xEE 0x90 0xBF". The bug triggers when we compare the refname > "refs/heads/<shaved-ice>" to something like "refs/heads/z", and it > specifically hits when comparing the first byte, "0xEE". > > The root cause is that the most-significant bit of 0xEE is set. The > `refname` and `prefix` pointers that we use to compare bytes with one > another are both pointers to signed characters. As such, when we > dereference the 0xEE byte the result is a _negative_ value, and this > value will of course compare smaller than "z". > > We can see that this issue is avoided in `cmp_packed_refname()`, where > we explicitly cast each byte to its unsigned form. Fix the bug by doing > the same in `packed_ref_iterator_advance()`. Ah, good catch. I think this signed-ness issue has come up before, long ago, but I don't remember the context. In theory any stable ordering is OK for sorting, but of course cmp_packed_refname() chose to use unsigned in order to match strcmp(), and the standard defines it as interpreting the bytes as unsigned. One of the enjoyable quirks of C. > diff --git a/refs/packed-backend.c b/refs/packed-backend.c > index b4289a7d9ce..7e31904bd41 100644 > --- a/refs/packed-backend.c > +++ b/refs/packed-backend.c > @@ -980,9 +980,9 @@ static int packed_ref_iterator_advance(struct ref_iterator *ref_iterator) > continue; > > while (prefix && *prefix) { > - if (*refname < *prefix) > + if ((unsigned char)*refname < (unsigned char)*prefix) > BUG("packed-refs backend yielded reference preceding its prefix"); > - else if (*refname > *prefix) > + else if ((unsigned char)*refname > (unsigned char)*prefix) > return ITER_DONE; > prefix++; > refname++; The patch itself looks good to me. > +test_expect_success 'list packed refs with unicode characters' ' > + test_when_finished "rm -rf repo" && > + git init repo && > + ( > + cd repo && > + test_commit --no-tag A && > + git update-ref refs/heads/ HEAD && > + git update-ref refs/heads/z HEAD && It's possible some filesystems might be unhappy with this character, but I guess we can see if anybody screams. > + git pack-refs --all && > + printf "%s commit\trefs/heads/z\n" $(git rev-parse HEAD) >expect && > + git for-each-ref refs/heads/z >actual && > + test_cmp expect actual This loses the exit code of rev-parse, but IMHO that is not a big deal. We'd notice the broken output when we call test_cmp. I don't know if people who are eagerly hunting down missed exit codes might flag it, though. Thanks for the quick turnaround on this (and to Elijah for reporting). -Peff