Re: [PATCH v7 2/2] submodule: skip redundant active entries when pattern covers path

JAYATHEERTH K <jayatheerthkulkarni2005@xxxxxxxxx> · Fri, 6 Jun 2025 20:11:42 +0530

On Wed, Jun 4, 2025 at 9:52 AM Junio C Hamano <gitster@xxxxxxxxx> wrote:
>
> JAYATHEERTH K <jayatheerthkulkarni2005@xxxxxxxxx> writes:
>
> >> The new helper is a maintenance burden to keep in sync with
> >> submodule.c:is_tree_submodule_active(); if we really want to go this
> >> route, the patch should extract that "ah, submodule.active is set so
> >> let's turn it into pathspec and see if the path matches" part of the
> >> logic to make sure the logic is shared.  But I am wondering if we
> >> can do this without any new helper.
> >
> > I've actually done something like this, but I've wrapped the core logic within
> > the if else after checking [1]
>
> I do not think I follow what you want to say here.  Care to rephrase?
>
> >> I.e.  Can we replace this if() condition with something like this?
> >>
> >>         /*
> >>          * Explicitly set 'submodule.<name>.active' only if it is not
> >>          * 'active' due to other reasons.
> >>          */
> >>         if (!is_submodule_active(the_repository, add_data->sm_path)) {
> >>
> >> That is, we ask if the submodule is already active (we are before
> >> adding submodule.<name>.active for this thing---if may be active due
> >> to submodule.active or submodule.<name>.url) and enter the block
> >> only when it is not yet.
> >>
> >> That way, this codepath does not have to worry about the exact logic
> >> that determines if a submodule is 'active' even when its .active
> >> configuration variable is not set.
> >>
> >> >               key = xstrfmt("submodule.%s.active", add_data->sm_name);
> >> >               git_config_set_gently(key, "true");
> >> >               free(key);
> >
> >
> > I've done the exact thing in a bit different way
>
> I am confused again, as this reads as if it is an oxymoron --- did
> you write the exactly the same thing as suggested above, or did you
> write something different?
>

Ok so I'm expand on why I thought this would work;

- if (!git_config_get("submodule.active")) {
- /*
- * If the submodule being added isn't already covered by the
- * current configured pathspec, set the submodule's active flag
- */
- if (!is_submodule_active(the_repository, add_data->sm_path)) {
+ if (git_config_get("submodule.active") || /* key absent */
+   git_config_get_string_multi("submodule.active", &values)) {
+ /* submodule.active is missing -> force-enable */
+ key = xstrfmt("submodule.%s.active", add_data->sm_name);
+ git_config_set_gently(key, "true");
+ free(key);
+ } else {
+ for (i = 0; i < values->nr; i++) {
+ const char *pat = values->items[i].string;
+ if (!wildmatch(pat, add_data->sm_path, 0)) { /* match found */
+ matched = 1;
+ break;
+ }
+ }
+ if (!matched) { /* no pattern matched -> force-enable */
  key = xstrfmt("submodule.%s.active", add_data->sm_name);
  git_config_set_gently(key, "true");
  free(key);
  }
- } else {
- key = xstrfmt("submodule.%s.active", add_data->sm_name);
- git_config_set_gently(key, "true");
- free(key);
  }
 }

To clarify: My aim with the current patch wasn't to create a
general-purpose helper, but to very precisely control the conditions
under which submodule.<name>.active = true is written during a git
submodule add operation. But I sure did have a thought that something
like this should exist.

The goals I considerat core when adding a new submodules should be
something like
a. The submodule should become active. This is the primary user expectation.
b.The configuration should be as clean and intentional as possible.
Avoid redundant entries if the global submodule.active already covers
the new submodule.
b. The configuration should explicitly reflect the active state if
there's no global policy, or if the global policy.

This was my underlying goal to be very specific.

Your suggestion to use if (!is_submodule_active(the_repository,
add_data->sm_path)) is elegant in its reuse of existing logic.
However, I find these following things

i. Scenario: submodule.active is globally ABSENT.
is_submodule_active() will return true (as submodules are active by
default in this case).

Therefore, !is_submodule_active() will be false.

Result with your suggestion: submodule.<name>.active = true would not
be written. The submodule is active by default, but this isn't
explicitly recorded for the submodule itself. Also a reason why the
test failed after digging a bit deeper this is the possible
explanation I found.
Specifically Test case 9 from t7413

test_expect_success 'is-active, submodule.active and submodule add' '
test_when_finished "rm -rf super2" &&
git init super2 &&
test_commit -C super2 initial &&
git -C super2 config --add submodule.active "sub*" &&

# submodule add should only add submodule.<name>.active
# to the config if not matched by the pathspec
git -C super2 submodule add ../sub sub1 &&
test_must_fail git -C super2 config --get submodule.sub1.active &&

git -C super2 submodule add ../sub mod &&
git -C super2 config --get submodule.mod.active
'

ii. My patch's behavior in this scenario (And not as a general purpose helper)

The first condition git_config_get("submodule.active") ||
git_config_get_string_multi("submodule.active", &values) evaluates to
true because submodule.active is absent.

Result with my patch: submodule.<name>.active = true is explicitly written.

Let me also clarify on what I mean by the _similar_ and _different_
statements I sent above.
The outcome is what I meant was _similar_ the patch link I sent in the
above mail, explicitly adds only when it doesn't find
new submodule to be active, and I verified it with a documented test
added as the tenth test in t7413 where it shouldn't add
an active twice and the outcome is what I found _similar_, the
_difference_ however are in the methods of getting to the outcome,
and adding guardrails. That's the underlying thought with that statement.

To reinforce the above ideas her e aer some points I would like to make:

The logic implemented in the else block (reading submodule.active and
iterating with wildmatch) is indeed a partial re-implementation of
what is_submodule_active might do internally. However:

It's highly localized to this specific decision point in git submodule add.
It's concerned only with the submodule.active pathspecs.
is_submodule_active might have broader considerations (as you hinted,
e.g., submodule.<name>.url's presence, though that's less relevant for
the .active flag itself).
The benefit of precise control over config generation for add might
outweigh the risk of this small, localized logic diverging, especially
since submodule.active's pathspec interpretation is fairly stable.

The core distinction is that is_submodule_active() answers "Is this
submodule considered active right now based on all existing rules and
defaults?". My patch attempts to answer a slightly different question
for git add: "Given the current global submodule.active policy, do we
need to explicitly set submodule.<name>.active = true for this new
submodule to ensure it's active and clearly recorded as such?"
And that is the reason why I had to mention the _different_ and _same_
statement within the same statement.

My approach aims to set submodule.<name>.active = true unless an
existing, valid, and matching global submodule.active pattern already
makes it active. This leads to what I feel is a more robust and
explicit configuration outcome from git submodule add.

Perhaps the ideal long-term solution would involve a more nuanced
helper function (which I'm more than happy to write, if I get the
green flag.), but for the immediate improvement to git add's behavior,
I believe this patch strikes a good balance, prioritizing explicit
activation and config clarity for newly added submodules.

Also to clarify :)
I tried to dig and clarify as much as I can
I maybe wrong in some places cause obviously I'm still relatively much
newer to the source code.

But I thought providing a huge para with what I thought would be
helpful for corrections.

Thank you for reading all the blab : )

- Jayatheerth