Re: [PATCH] refs/packed: fix BUG when seeking refs with UTF-8 characters

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Apr 04, 2025 at 12:58:38PM +0200, Patrick Steinhardt wrote:

> But in practice the check itself is misbehaving when handling unicode
> characters. The particular issue triggered with a branch that got the
> "shaved ice" unicode character in its name, which is composed of the
> bytes "0xEE 0x90 0xBF". The bug triggers when we compare the refname
> "refs/heads/<shaved-ice>" to something like "refs/heads/z", and it
> specifically hits when comparing the first byte, "0xEE".
> 
> The root cause is that the most-significant bit of 0xEE is set. The
> `refname` and `prefix` pointers that we use to compare bytes with one
> another are both pointers to signed characters. As such, when we
> dereference the 0xEE byte the result is a _negative_ value, and this
> value will of course compare smaller than "z".
> 
> We can see that this issue is avoided in `cmp_packed_refname()`, where
> we explicitly cast each byte to its unsigned form. Fix the bug by doing
> the same in `packed_ref_iterator_advance()`.

Ah, good catch. I think this signed-ness issue has come up before, long
ago, but I don't remember the context. In theory any stable ordering is
OK for sorting, but of course cmp_packed_refname() chose to use unsigned
in order to match strcmp(), and the standard defines it as interpreting
the bytes as unsigned. One of the enjoyable quirks of C.

> diff --git a/refs/packed-backend.c b/refs/packed-backend.c
> index b4289a7d9ce..7e31904bd41 100644
> --- a/refs/packed-backend.c
> +++ b/refs/packed-backend.c
> @@ -980,9 +980,9 @@ static int packed_ref_iterator_advance(struct ref_iterator *ref_iterator)
>  			continue;
>  
>  		while (prefix && *prefix) {
> -			if (*refname < *prefix)
> +			if ((unsigned char)*refname < (unsigned char)*prefix)
>  				BUG("packed-refs backend yielded reference preceding its prefix");
> -			else if (*refname > *prefix)
> +			else if ((unsigned char)*refname > (unsigned char)*prefix)
>  				return ITER_DONE;
>  			prefix++;
>  			refname++;

The patch itself looks good to me.

> +test_expect_success 'list packed refs with unicode characters' '
> +	test_when_finished "rm -rf repo" &&
> +	git init repo &&
> +	(
> +		cd repo &&
> +		test_commit --no-tag A &&
> +		git update-ref refs/heads/ HEAD &&
> +		git update-ref refs/heads/z HEAD &&

It's possible some filesystems might be unhappy with this character, but
I guess we can see if anybody screams.

> +		git pack-refs --all &&
> +		printf "%s commit\trefs/heads/z\n" $(git rev-parse HEAD) >expect &&
> +		git for-each-ref refs/heads/z >actual &&
> +		test_cmp expect actual

This loses the exit code of rev-parse, but IMHO that is not a big deal.
We'd notice the broken output when we call test_cmp. I don't know if
people who are eagerly hunting down missed exit codes might flag it,
though.

Thanks for the quick turnaround on this (and to Elijah for reporting).

-Peff




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux