Re: [PATCH] refs/packed: fix BUG when seeking refs with UTF-8 characters

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Apr 4, 2025 at 3:58 AM Patrick Steinhardt <ps@xxxxxx> wrote:
>
> It was reported that using git-pull(1) in a repository whose remote
> contains branches with emojis leads to the following bug:
>
>     $ git pull
>     remote: Enumerating objects: 161255, done.
>     remote: Counting objects: 100% (55884/55884), done.
>     remote: Compressing objects: 100% (5518/5518), done.
>     remote: Total 161255 (delta 54253), reused 50509 (delta 50364),
>     pack-reused 105371 (from 4)
>     Receiving objects: 100% (161255/161255), 309.90 MiB | 16.87 MiB/s, done.
>     Resolving deltas: 100% (118048/118048), completed with 13416 local objects.
>     From github.com:github/github
>        97ab7ae3f3745..8fb2f9fa180ed  master -> origin/master
>     [...snip many screenfuls of updates to origin remotes...]
>     BUG: refs/packed-backend.c:984: packed-refs backend yielded reference
>     preceding its prefix
>     error: fetch died of signal 6
>
> This issue bisects to 22600c04529 (refs/iterator: implement seeking for
> packed-ref iterators, 2025-03-12) where we have implemented seeking for
> the packed-ref iterator. As part of that change we introduced a check
> that verifies that the iterator only returns refnames bigger than the
> prefix. In theory, this check should always hold: when a prefix is set
> we know that we would've seeked that prefix first, so we should never
> see a reference sorting before that prefix.
>
> But in practice the check itself is misbehaving when handling unicode
> characters. The particular issue triggered with a branch that got the
> "shaved ice" unicode character in its name, which is composed of the
> bytes "0xEE 0x90 0xBF". The bug triggers when we compare the refname
> "refs/heads/<shaved-ice>" to something like "refs/heads/z", and it
> specifically hits when comparing the first byte, "0xEE".
>
> The root cause is that the most-significant bit of 0xEE is set. The
> `refname` and `prefix` pointers that we use to compare bytes with one
> another are both pointers to signed characters. As such, when we
> dereference the 0xEE byte the result is a _negative_ value, and this
> value will of course compare smaller than "z".
>
> We can see that this issue is avoided in `cmp_packed_refname()`, where
> we explicitly cast each byte to its unsigned form. Fix the bug by doing
> the same in `packed_ref_iterator_advance()`.
>
> Reported-by: Elijah Newren <newren@xxxxxxxxx>
> Signed-off-by: Patrick Steinhardt <ps@xxxxxx>
> ---
> Hi,
>
> this patch addresses the issue reported by Elijah at [1]. Thanks!

Thanks for the fix!

> Patrick
>
> [1]: <CABPp-BFBqC_t5QSexRQpYsqXBa11WK+OqGt167E=K=xod=buQw@xxxxxxxxxxxxxx>
> ---
>  refs/packed-backend.c  |  4 ++--
>  t/t1408-packed-refs.sh | 15 +++++++++++++++
>  2 files changed, 17 insertions(+), 2 deletions(-)
>
> diff --git a/refs/packed-backend.c b/refs/packed-backend.c
> index b4289a7d9ce..7e31904bd41 100644
> --- a/refs/packed-backend.c
> +++ b/refs/packed-backend.c
> @@ -980,9 +980,9 @@ static int packed_ref_iterator_advance(struct ref_iterator *ref_iterator)
>                         continue;
>
>                 while (prefix && *prefix) {
> -                       if (*refname < *prefix)
> +                       if ((unsigned char)*refname < (unsigned char)*prefix)
>                                 BUG("packed-refs backend yielded reference preceding its prefix");
> -                       else if (*refname > *prefix)
> +                       else if ((unsigned char)*refname > (unsigned char)*prefix)
>                                 return ITER_DONE;
>                         prefix++;
>                         refname++;

Oh, right, I assumed there was going to be lots of other places that
needed casting outside this function.  If I would have merely checked
the one other line in this function and updated it, I would have had
the fix...

> diff --git a/t/t1408-packed-refs.sh b/t/t1408-packed-refs.sh
> index 41ba1f1d7fc..833477f0fa3 100755
> --- a/t/t1408-packed-refs.sh
> +++ b/t/t1408-packed-refs.sh
> @@ -42,4 +42,19 @@ test_expect_success 'no error from stale entry in packed-refs' '
>         test_cmp expect actual
>  '
>
> +test_expect_success 'list packed refs with unicode characters' '
> +       test_when_finished "rm -rf repo" &&
> +       git init repo &&
> +       (
> +               cd repo &&
> +               test_commit --no-tag A &&
> +               git update-ref refs/heads/ HEAD &&
> +               git update-ref refs/heads/z HEAD &&
> +               git pack-refs --all &&
> +               printf "%s commit\trefs/heads/z\n" $(git rev-parse HEAD) >expect &&
> +               git for-each-ref refs/heads/z >actual &&
> +               test_cmp expect actual
> +       )
> +'
> +
>  test_done
>
> ---
> base-commit: 5b97a56fa0e7d580dc8865b73107407c9b3f0eff
> change-id: 20250404-b4-pks-packed-backend-seek-with-utf8-668c182ddcf7

I also tested this on my backup of the repo from when I triggered the
error, first making sure that I could still trigger again without this
fix, and then that this fix handled that case.  Works great, thanks!





[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux