[PATCH v2] abbrev: allow extending beyond 20 chars to disambiguate

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



When you have two or more objects with object names that share more
than half the length of the hash algorithm in use (e.g. 10 bytes for
SHA-1 that produces 20-byte/160-bit hash), find_unique_abbrev()
fails to show disambiguation.

To see how many leading letters of a given full object name is
sufficiently unambiguous, the algorithm starts from a initial
length, guessed based on the estimated number of objects in the
repository, and see if another object that shares the prefix, and
keeps extending the abbreviation.  The loop stops at GIT_MAX_RAWSZ,
which is counted as the number of bytes, since 5b20ace6 (sha1_name:
unroll len loop in find_unique_abbrev_r(), 2017-10-08); before that
change, it extended up to GIT_SHA1_HEXSZ, which was the correct
limit because the loop is adding one output letter per iteration
and back then SHA256 was not in the picture.

Pass the max length of the hash being in use in the current
repository down the code path, and use it to compute the code to
update the abbreviation length required to make it unique.

Signed-off-by: Junio C Hamano <gitster@xxxxxxxxx>
---
 object-name.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/object-name.c b/object-name.c
index 11aa0e6afc..8f9af57c0a 100644
--- a/object-name.c
+++ b/object-name.c
@@ -680,6 +680,7 @@ static unsigned msb(unsigned long val)
 struct min_abbrev_data {
 	unsigned int init_len;
 	unsigned int cur_len;
+	unsigned int max_len;
 	char *hex;
 	struct repository *repo;
 	const struct object_id *oid;
@@ -699,12 +700,12 @@ static inline char get_hex_char_from_oid(const struct object_id *oid,
 static int extend_abbrev_len(const struct object_id *oid, void *cb_data)
 {
 	struct min_abbrev_data *mad = cb_data;
-
 	unsigned int i = mad->init_len;
+
 	while (mad->hex[i] && mad->hex[i] == get_hex_char_from_oid(oid, i))
 		i++;
 
-	if (i < GIT_MAX_RAWSZ && i >= mad->cur_len)
+	if (mad->cur_len <= i && i < mad->max_len)
 		mad->cur_len = i + 1;
 
 	return 0;
@@ -864,6 +865,7 @@ int repo_find_unique_abbrev_r(struct repository *r, char *hex,
 	mad.repo = r;
 	mad.init_len = len;
 	mad.cur_len = len;
+	mad.max_len = hexsz;
 	mad.hex = hex;
 	mad.oid = oid;
 

Range-diff:
1:  2e1d2b4ef6 ! 1:  5c67e57f14 abbrev: allow extending beyond 20 chars to disambiguate
    @@ Commit message
         keeps extending the abbreviation.  The loop stops at GIT_MAX_RAWSZ,
         which is counted as the number of bytes, since 5b20ace6 (sha1_name:
         unroll len loop in find_unique_abbrev_r(), 2017-10-08); before that
    -    change, it extended up to GIT_MAX_HEXSZ, which is the correct limit
    -    because the loop is adding one output letter per iteration.
    +    change, it extended up to GIT_SHA1_HEXSZ, which was the correct
    +    limit because the loop is adding one output letter per iteration and
    +    back then SHA256 was not in the picture.
    +
    +    Pass the max length of the hash being in use in the current
    +    repository down the code path, and use it to compute the code to
    +    update the abbreviation length required to make it unique.
     
         Signed-off-by: Junio C Hamano <gitster@xxxxxxxxx>
     
      ## object-name.c ##
    -@@ object-name.c: static int extend_abbrev_len(const struct object_id *oid, void *cb_data)
    +@@ object-name.c: static unsigned msb(unsigned long val)
    + struct min_abbrev_data {
    + 	unsigned int init_len;
    + 	unsigned int cur_len;
    ++	unsigned int max_len;
    + 	char *hex;
    + 	struct repository *repo;
    + 	const struct object_id *oid;
    +@@ object-name.c: static inline char get_hex_char_from_oid(const struct object_id *oid,
    + static int extend_abbrev_len(const struct object_id *oid, void *cb_data)
    + {
    + 	struct min_abbrev_data *mad = cb_data;
    +-
    + 	unsigned int i = mad->init_len;
    ++
      	while (mad->hex[i] && mad->hex[i] == get_hex_char_from_oid(oid, i))
      		i++;
      
     -	if (i < GIT_MAX_RAWSZ && i >= mad->cur_len)
    -+	if (i < GIT_MAX_HEXSZ && i >= mad->cur_len)
    ++	if (mad->cur_len <= i && i < mad->max_len)
      		mad->cur_len = i + 1;
      
      	return 0;
    +@@ object-name.c: int repo_find_unique_abbrev_r(struct repository *r, char *hex,
    + 	mad.repo = r;
    + 	mad.init_len = len;
    + 	mad.cur_len = len;
    ++	mad.max_len = hexsz;
    + 	mad.hex = hex;
    + 	mad.oid = oid;
    + 
-- 
2.51.0-rc1-144-g869f44a1ca





[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux