Re: [PATCH] abbrev: allow extending beyond 20 chars to disambiguate

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2025-08-11 at 15:26:32, Junio C Hamano wrote:
> When you have two or more objects with object names that share more
> than half the length of the hash algorithm in use (e.g. 10 bytes for
> SHA-1 that produces 20-byte/160-bit hash), find_unique_abbrev()
> fails to show disambiguation.

Is this really the case?  If the restriction is due to using
GIT_MAX_RAWSZ instead of GIT_MAX_HEXSZ, then that's 32 vs. 64 in our
modern codebase.

> To see how many leading letters of a given full object name is
> sufficiently unambiguous, the algorithm starts from a initial
> length, guessed based on the estimated number of objects in the
> repository, and see if another object that shares the prefix, and
> keeps extending the abbreviation.  The loop stops at GIT_MAX_RAWSZ,
> which is counted as the number of bytes, since 5b20ace6 (sha1_name:
> unroll len loop in find_unique_abbrev_r(), 2017-10-08); before that
> change, it extended up to GIT_MAX_HEXSZ, which is the correct limit
> because the loop is adding one output letter per iteration.

Nicely explained.

>  * No tests added, since I do not think I want to find two valid
>    objects with their object names sharing the same prefix that is
>    more than 20 letters long.  The current abbreviation code happens
>    to ignore validity of the object and takes invalid objects into
>    account when disambiguating, but I do not want to see a test rely
>    on that.

Yes, even if we could efficiently create such a collision with SHA-1
using the best known attacks on it, that would still be 2^63.5, which
was estimated to cost about USD 10,000 in 2025.  I don't think doing
that just to produce a test would be a good use of the project's (or
really, anyone else's) funds.  Using SHA-256, of course, would require
at least 2^80 work.
-- 
brian m. carlson (they/them)
Toronto, Ontario, CA

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux