Michael Schroeder <mls@xxxxxxx> writes: > If a submodule uses a different hash algorithm than used in > the main repository, the recorded submodule commit is padded > with zeros. Hmph. I suspect that this "extra zero bits" would happen when your superproject uses a longer hash (e.g. SHA-256) than the hash used by the submodule you are fetching? If the arrangement is reversed, would we see a different and possibly an even severe problem? I do not mean to say that you need to solve the problem in both direction at all. But perhaps ... uses a hash algorithm with shorter hash length than what is used in the main repository, ... would clarify what problem you are solving in this patch better. > This is usually not a problem as the default is to > do submodule clones non-shallow and the commit can be found > in the local objects. > > But this is not true if the --shallow-submodules clone option is > used (or the --depth option in the submodule update call). > In this case, the commit is often not reachable and a fetch of the > specific commit is done. But the fetch cannot deal with the zero > padding and interprets the commit as a name. Because of this, > the checkout will fail. > > Implement truncation of the recorded commit to the correct size > corresponding to the hash algorithm used in the submodule. When we use the verb "truncate" in the context of this project, I think we always use the word to mean making it shorter than its natural length. But in the above, the word is used to remove excess so that we use it at its natural length. I cannot offhand offer a better phrase, so I'll let it go for now, but suggestions for better wording is very much welcome. > builtin/submodule--helper.c | 18 ++++++++++++------ > 1 file changed, 12 insertions(+), 6 deletions(-) > > diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c > index 07a1935cbe..ef21eb42b8 100644 > --- a/builtin/submodule--helper.c > +++ b/builtin/submodule--helper.c > @@ -72,7 +72,7 @@ static char *resolve_relative_url(const char *rel_url, const char *up_path, int > return resolved_url; > } > > -static int get_default_remote_submodule(const char *module_path, char **default_remote) > +static int get_default_remote_submodule(const char *module_path, char **default_remote, const struct git_hash_algo **hash_algo) > { > const struct submodule *sub; > struct repository subrepo; > @@ -106,6 +106,9 @@ static int get_default_remote_submodule(const char *module_path, char **default_ > > *default_remote = xstrdup(remote_name); > > + if (hash_algo) > + *hash_algo = subrepo.hash_algo; > + > repo_clear(&subrepo); > free(url); > > @@ -1272,7 +1275,7 @@ static void sync_submodule(const char *path, const char *prefix, > goto cleanup; > > strbuf_reset(&sb); > - code = get_default_remote_submodule(path, &default_remote); > + code = get_default_remote_submodule(path, &default_remote, NULL); > if (code) > exit(code); > > @@ -2319,16 +2322,19 @@ static int fetch_in_submodule(const char *module_path, int depth, int quiet, > if (depth) > strvec_pushf(&cp.args, "--depth=%d", depth); > if (oid) { > - char *hex = oid_to_hex(oid); > + char hexbuffer[GIT_MAX_HEXSZ + 1]; > + char *hex = oid_to_hex_r(hexbuffer, oid); > char *remote; > + const struct git_hash_algo *hash_algo = NULL; > int code; > > - code = get_default_remote_submodule(module_path, &remote); > + code = get_default_remote_submodule(module_path, &remote, &hash_algo); It feels highly unsatisfying to have oid_to_hex_r() blindly read and convert oid to hex (assuming the hash length of the current repository) without using the knowledge of how much of that object name is valid. We obtain that knowledge shortly after we call oid_to_hex_r() in hash_algo here. That is the only reason why we manually have to truncate below, isn't it? In other words, shouldn't we be finding out hash_algo first, and then use something like hash_to_hex_algop_r() to convert only the valid bits of the oid into the buffer?. That way, we do not have to manually add '\0' in this function below. > if (code) { > child_process_clear(&cp); > return code; > } > - > + if (hash_algo) > + hex[hash_algo->hexsz] = 0; /* truncate to correct size */ > strvec_pushl(&cp.args, remote, hex, NULL); > free(remote); > } > @@ -2635,7 +2641,7 @@ static int update_submodule(struct update_data *update_data) > char *remote_ref; > int code; > > - code = get_default_remote_submodule(update_data->sm_path, &remote_name); > + code = get_default_remote_submodule(update_data->sm_path, &remote_name, NULL); > if (code) > return code; > code = remote_submodule_branch(update_data->sm_path, &branch);