Re: Bug: Git sometimes disregards wildcards in pathspecs if a file name matches exactly

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Piotr, Hello everyone,

Thanks for the clear bug report, Piotr. I can reproduce the behavior
you described in 2.49:

On Fri, Apr 11, 2025 at 9:08 PM Piotr Siupa <piotrsiupa@xxxxxxxxx> wrote:
>
> Hi! I think I've found a bug in the command "git add".
> It can be reproduced in a fresh repository by running:
>
> git init
> touch 'foo' 'f*'
> git add 'f*'
>
> The last command should add both files "f*" and "foo" to the index but
> it adds only "f*".
> Running it the second time works as expected. (It adds "foo" on the
> second attempt.)

Following the code path down from 'cmd_add' (in 'builtin/add.c'),
the issue appears to stem from how pathspecs are matched against
directory entries. This happens specifically within the 'prune_directory'
function which uses 'do_match_pathspec' internally (likely called via
'dir_path_match' -> 'match_pathspec' -> 'match_pathspec_with_flags').

Here's a breakdown of what seems to be happening during that first
'git add ''f*''' call:

First, 'cmd_add' sees it needs to add new files. Then, 'fill_directory'
finds both untracked files: 'foo' and the literal 'f*'.

Next, 'prune_directory' is called to filter these using the pathspec ''f*''.
Inside 'prune_directory', the 'do_match_pathspec' function is called for
each file ('foo', then 'f*', or vice-versa) against the pathspec list
(which just contains ''f*''). These calls share a common marker array
(often called 'seen') to track which pathspecs have found a match so far.

When 'do_match_pathspec' processes the literal file 'f*' against the
pathspec item ''f*'', it calls 'match_pathspec_item'. This helper function
likely returns a code like 'MATCHED_EXACTLY' because the pattern ''f*''
happens to exactly match the filename '"f*"'. Consequently,
'do_match_pathspec' updates the 'seen' array for the ''f*'' pathspec to
mark it as exactly matched. Since a match was found, 'prune_directory'
decides to keep the 'f*' entry.

The problem arises when 'do_match_pathspec' processes the other file, 'foo',
against the same pathspec item ''f*''. Before doing the actual comparison,
it checks the 'seen' array and finds that the ''f*'' pathspec was already
marked 'MATCHED_EXACTLY' (from processing the literal 'f*' file).
An optimization check like 'if (seen && seen[i] == MATCHED_EXACTLY)'
then evaluates to true. This causes the loop to 'continue', skipping the
call to 'match_pathspec_item' entirely for the 'foo' file against the ''f*''
pattern. Because no match was found *in this specific call*, 'do_match_pathspec'
returns 0, and 'prune_directory' discards the 'foo' entry.

Finally, 'prune_directory' returns the filtered list, now containing only 'f*',
and 'add_files' adds only that file to the index.

On the *second* 'git add ''f*''' call, 'fill_directory' only finds the
untracked 'foo'. 'do_match_pathspec' runs with a fresh 'seen' array,
so the 'MATCHED_EXACTLY' check is initially false. 'match_pathspec_item'
is called for 'foo', returns 'MATCHED_FNMATCH' (a glob match), and 'foo'
is correctly added.

> I'm using Git 2.43.2. The current "next" (2.49.0.805.g082f7c87e0)
> seems to have the same behavior if I'm testing it correctly.

Yes, the relevant code structures in 'do_match_pathspec' appear similar
in recent versions, suggesting the behavior is likely consistent.

Conclusion:

The core issue seems to be that optimization check within 'do_match_pathspec':

  // inside do_match_pathspec loop:
  if (seen && seen[i] == MATCHED_EXACTLY)
          continue;

This optimization assumes that once a pathspec item has achieved an
"exact" match against *some* file, it doesn't need to be checked
against *any other* files during the same directory scan operation.

However, when a pathspec contains glob characters (like ''f*'') but
happens to *also* exactly match a literal filename ('f*'),
'match_pathspec_item' appears to return 'MATCHED_EXACTLY'. This triggers
the optimization, incorrectly preventing the *same* pathspec pattern ''f*''
from matching *other* files (like 'foo') via its intended glob behavior
during that initial scan.

A potential fix might involve adjusting the logic in 'match_pathspec_item'
to perhaps not return 'MATCHED_EXACTLY' if the match involved globbing,
or modifying the 'seen' check in 'do_match_pathspec' to account for
this ambiguity.

Thanks again for spotting this subtle behavior!

-Jayatheerth




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux