Re: [PATCH v2] abbrev: allow extending beyond 20 chars to disambiguate

René Scharfe <l.s.r@xxxxxx> · Tue, 12 Aug 2025 17:59:56 +0200

On 8/12/25 5:17 PM, Junio C Hamano wrote:
> René Scharfe <l.s.r@xxxxxx> writes:
> 
>>>  	while (mad->hex[i] && mad->hex[i] == get_hex_char_from_oid(oid, i))
>>>  		i++;
>>>  
>>> -	if (i < GIT_MAX_RAWSZ && i >= mad->cur_len)
>>> +	if (mad->cur_len <= i && i < mad->max_len)
>>>  		mad->cur_len = i + 1;
>>
>> This combines two checks: Whether we can increment and whether the new
>> length is greater than the old one.  Only if both are true we take the
>> new length.  Shouldn't they be separate, though?  Why reject a new
>> length that happens to be the maximum?  And max_len is not explicitly
>> needed for the first check:
>>
>> 	/* One more to disambiguate, if possible. */
>> 	if (mad->hex[i])
>> 		i++;
>>
>> 	/* New record? */
>> 	if (i > mad->cur_len)
>> 		mad->cur_len = i;
>>
>> René
> 
> Great.
> 
> Your observation resolves my puzzlement about the first while() loop
> that has been bugging me ever since I started looking at this code.
> The mad->hex[] array is NUL terminated, and the loop can terminate
> correctly without being told about hexsz at all, and we ought to be
> able to use the same information to make sure we stop incrementing
> the .cur_len member without running beyond the end of the string.
> 
> In other words, wouldn't this be what we want, without any of the
> max_len crap?
> 
> diff --git c/object-name.c w/object-name.c
> index 11aa0e6afc..4cd1d38778 100644
> --- c/object-name.c
> +++ w/object-name.c
> @@ -704,7 +704,7 @@ static int extend_abbrev_len(const struct object_id *oid, void *cb_data)
>  	while (mad->hex[i] && mad->hex[i] == get_hex_char_from_oid(oid, i))
>  		i++;
>  
> -	if (i < GIT_MAX_RAWSZ && i >= mad->cur_len)
> +	if (mad->hex[i] && i >= mad->cur_len)
>  		mad->cur_len = i + 1;
>  
>  	return 0;

This combines the two checks and I don't see why.  Why not update
cur_len when oid and mad->oid are the same (mad->hex[i] == '\0')?  Ah,
to _ignore_ duplicates on purpose, when we have the same object loose
and packed or in multiple packs, right?  We wouldn't want to let that
affect abbreviations, makes sense.

René