Re: [PATCH v2] submodule: prevent overwriting .gitmodules entry on path reuse

Junio C Hamano <gitster@xxxxxxxxx> · Tue, 13 May 2025 14:44:33 -0700

K Jayatheerth <jayatheerthkulkarni2005@xxxxxxxxx> writes:

> When a submodule is added at a path that previously hosted another submodule
> (e.g., 'child'), Git reuses the submodule name derived from the path and
> updates the corresponding entry in .gitmodules. This can silently overwrite
> existing configuration if the old submodule was only moved (e.g., to
> 'child_old') without renaming the submodule.

OK.

> This patch improves the `module_add()` logic by checking whether the
> submodule name already exists in the config but maps to a different path.

We frown upon a patch that says "This patch does X"; just give an
order to the codebase to "be like so".  I.e. "Improve the module-add
by doing X..." is how we phrase a proposed change.

> In such a case, Git now errors out unless `--force` is specified, thus
> preventing accidental overwrites. To proceed safely, the user can provide
> a new name via `--name` or use `--force`.

The above explains what happens in module_add() quite well.

What is puzzling about this change is that the new helper function
and changes to configure_added_submodule() is not described at all
in the proposed log message.  How are they relevant and why do we
need them?

> @@ -3443,6 +3452,7 @@ static int module_add(int argc, const char **argv, const char *prefix,
>  	int force = 0, quiet = 0, progress = 0, dissociate = 0;
>  	struct add_data add_data = ADD_DATA_INIT;
>  	const char *ref_storage_format = NULL;
> +	const struct submodule *existing;
>  	char *to_free = NULL;
>  	struct option options[] = {
>  		OPT_STRING('b', "branch", &add_data.branch, N_("branch"),
> @@ -3546,6 +3556,32 @@ static int module_add(int argc, const char **argv, const char *prefix,
>  	if(!add_data.sm_name)
>  		add_data.sm_name = add_data.sm_path;
>  
> +	existing = submodule_from_name(the_repository,
> +					null_oid(the_hash_algo),
> +					add_data.sm_name);
> +
> +	if (existing && strcmp(existing->path, add_data.sm_path)) {

If the name is in use, and the submodule with that name is at a
different path, then we are in trouble, OK.

> +		if (!force)
> +			die(_("submodule name '%s' already used for path '%s'"),
> +			add_data.sm_name, existing->path);
> +
> +		/* --force: build <name><n> until unique */
> +		struct strbuf buf = STRBUF_INIT;

"-Wdeclaration-after-statement"

> +		strbuf_addstr(&buf, add_data.sm_name);
> +
> +		for (int i = 1; ; i++) {
> +			strbuf_setlen(&buf, 0);
> +			strbuf_addf(&buf, "%s%d", add_data.sm_name, i);
> +
> +			if (!submodule_from_name(the_repository,
> +						null_oid(the_hash_algo),
> +						buf.buf))
> +				break;
> +		}
> +
> +		add_data.sm_name = strbuf_detach(&buf, NULL);
> +	}
> +

What is the memory ownership rule for add_data.sm_name?  Earlier we
saw in a pre-context of a hunk that this was assigned from
add_data.sm_path, so in that codepath it is considered a borrowed
piece of memory, but here the member has to be the one that owns the
string detached from the strbuf, which eventually must be freed by
somebody, or it would be a memory leak.