On 2025-08-11 at 15:26:32, Junio C Hamano wrote: > When you have two or more objects with object names that share more > than half the length of the hash algorithm in use (e.g. 10 bytes for > SHA-1 that produces 20-byte/160-bit hash), find_unique_abbrev() > fails to show disambiguation. Is this really the case? If the restriction is due to using GIT_MAX_RAWSZ instead of GIT_MAX_HEXSZ, then that's 32 vs. 64 in our modern codebase. > To see how many leading letters of a given full object name is > sufficiently unambiguous, the algorithm starts from a initial > length, guessed based on the estimated number of objects in the > repository, and see if another object that shares the prefix, and > keeps extending the abbreviation. The loop stops at GIT_MAX_RAWSZ, > which is counted as the number of bytes, since 5b20ace6 (sha1_name: > unroll len loop in find_unique_abbrev_r(), 2017-10-08); before that > change, it extended up to GIT_MAX_HEXSZ, which is the correct limit > because the loop is adding one output letter per iteration. Nicely explained. > * No tests added, since I do not think I want to find two valid > objects with their object names sharing the same prefix that is > more than 20 letters long. The current abbreviation code happens > to ignore validity of the object and takes invalid objects into > account when disambiguating, but I do not want to see a test rely > on that. Yes, even if we could efficiently create such a collision with SHA-1 using the best known attacks on it, that would still be 2^63.5, which was estimated to cost about USD 10,000 in 2025. I don't think doing that just to produce a test would be a good use of the project's (or really, anyone else's) funds. Using SHA-256, of course, would require at least 2^80 work. -- brian m. carlson (they/them) Toronto, Ontario, CA
Attachment:
signature.asc
Description: PGP signature