On Fri, Apr 4, 2025 at 3:58 AM Patrick Steinhardt <ps@xxxxxx> wrote: > > It was reported that using git-pull(1) in a repository whose remote > contains branches with emojis leads to the following bug: > > $ git pull > remote: Enumerating objects: 161255, done. > remote: Counting objects: 100% (55884/55884), done. > remote: Compressing objects: 100% (5518/5518), done. > remote: Total 161255 (delta 54253), reused 50509 (delta 50364), > pack-reused 105371 (from 4) > Receiving objects: 100% (161255/161255), 309.90 MiB | 16.87 MiB/s, done. > Resolving deltas: 100% (118048/118048), completed with 13416 local objects. > From github.com:github/github > 97ab7ae3f3745..8fb2f9fa180ed master -> origin/master > [...snip many screenfuls of updates to origin remotes...] > BUG: refs/packed-backend.c:984: packed-refs backend yielded reference > preceding its prefix > error: fetch died of signal 6 > > This issue bisects to 22600c04529 (refs/iterator: implement seeking for > packed-ref iterators, 2025-03-12) where we have implemented seeking for > the packed-ref iterator. As part of that change we introduced a check > that verifies that the iterator only returns refnames bigger than the > prefix. In theory, this check should always hold: when a prefix is set > we know that we would've seeked that prefix first, so we should never > see a reference sorting before that prefix. > > But in practice the check itself is misbehaving when handling unicode > characters. The particular issue triggered with a branch that got the > "shaved ice" unicode character in its name, which is composed of the > bytes "0xEE 0x90 0xBF". The bug triggers when we compare the refname > "refs/heads/<shaved-ice>" to something like "refs/heads/z", and it > specifically hits when comparing the first byte, "0xEE". > > The root cause is that the most-significant bit of 0xEE is set. The > `refname` and `prefix` pointers that we use to compare bytes with one > another are both pointers to signed characters. As such, when we > dereference the 0xEE byte the result is a _negative_ value, and this > value will of course compare smaller than "z". > > We can see that this issue is avoided in `cmp_packed_refname()`, where > we explicitly cast each byte to its unsigned form. Fix the bug by doing > the same in `packed_ref_iterator_advance()`. > > Reported-by: Elijah Newren <newren@xxxxxxxxx> > Signed-off-by: Patrick Steinhardt <ps@xxxxxx> > --- > Hi, > > this patch addresses the issue reported by Elijah at [1]. Thanks! Thanks for the fix! > Patrick > > [1]: <CABPp-BFBqC_t5QSexRQpYsqXBa11WK+OqGt167E=K=xod=buQw@xxxxxxxxxxxxxx> > --- > refs/packed-backend.c | 4 ++-- > t/t1408-packed-refs.sh | 15 +++++++++++++++ > 2 files changed, 17 insertions(+), 2 deletions(-) > > diff --git a/refs/packed-backend.c b/refs/packed-backend.c > index b4289a7d9ce..7e31904bd41 100644 > --- a/refs/packed-backend.c > +++ b/refs/packed-backend.c > @@ -980,9 +980,9 @@ static int packed_ref_iterator_advance(struct ref_iterator *ref_iterator) > continue; > > while (prefix && *prefix) { > - if (*refname < *prefix) > + if ((unsigned char)*refname < (unsigned char)*prefix) > BUG("packed-refs backend yielded reference preceding its prefix"); > - else if (*refname > *prefix) > + else if ((unsigned char)*refname > (unsigned char)*prefix) > return ITER_DONE; > prefix++; > refname++; Oh, right, I assumed there was going to be lots of other places that needed casting outside this function. If I would have merely checked the one other line in this function and updated it, I would have had the fix... > diff --git a/t/t1408-packed-refs.sh b/t/t1408-packed-refs.sh > index 41ba1f1d7fc..833477f0fa3 100755 > --- a/t/t1408-packed-refs.sh > +++ b/t/t1408-packed-refs.sh > @@ -42,4 +42,19 @@ test_expect_success 'no error from stale entry in packed-refs' ' > test_cmp expect actual > ' > > +test_expect_success 'list packed refs with unicode characters' ' > + test_when_finished "rm -rf repo" && > + git init repo && > + ( > + cd repo && > + test_commit --no-tag A && > + git update-ref refs/heads/ HEAD && > + git update-ref refs/heads/z HEAD && > + git pack-refs --all && > + printf "%s commit\trefs/heads/z\n" $(git rev-parse HEAD) >expect && > + git for-each-ref refs/heads/z >actual && > + test_cmp expect actual > + ) > +' > + > test_done > > --- > base-commit: 5b97a56fa0e7d580dc8865b73107407c9b3f0eff > change-id: 20250404-b4-pks-packed-backend-seek-with-utf8-668c182ddcf7 I also tested this on my backup of the repo from when I triggered the error, first making sure that I could still trigger again without this fix, and then that this fix handled that case. Works great, thanks!