From: Jacob Keller <jacob.keller@xxxxxxxxx> The --no-index option of git-diff enables using the diff machinery from git while operating outside of a repository. This mode of git diff is able to compare directories and produce a diff of their contents. When operating git diff in a repository, git has the notion of "pathspecs" which can specify which files to compare. In particular, when using git to diff two trees, you might invoke: $ git diff-tree -r <treeish1> <treeish2>. where the treeish could point to a subdirectory of the repository. When invoked this way, users can limit the selected paths of the tree by using a pathspec. Either by providing some list of paths to accept, or by removing paths via a negative refspec. The git diff --no-index mode does not support pathspecs, and cannot limit the diff output in this way. Other diff programs such as GNU difftools have options for excluding paths based on a pattern match. However, using git diff as a diff replacement has several advantages over many popular diff tools, including coloring moved lines, rename detections, and similar. Teach git diff --no-index how to handle pathspecs to limit the comparisons. This will only be supported if both provided paths are directories. This is because the --no-index mode already employs some DWIM shortcuts when dealing with comparing a directory and a file. Executing git diff --no-index D F is interpreted as if: $ git diff --no-index D/$(basename F) F Similarly, if you do git diff --no-index F D. Modify the fixup_paths function to return 1 if both paths are directories. If this is the case, interpret any extra arguments to git diff as pathspecs via parse_pathspec. Add a new PATHSPEC_NO_REPOSITORY flag to indicate to the parser that we do not have repository. Use this to aid in error message reporting in the event that the user happens to invoke git diff --no-index from a repository. Extend the prefix_path_gently function to correctly handle a prefix which does not end in '/', by inserting one as appropriate. Use parse_pathspec to load the remaining arguments (if any) to git diff --no-index as pathspec items. Disable PATHSPEC_ATTR support since we do not have a repository to do attribute lookup. Disable PATHSPEC_FROMTOP since we do not have a repository root. All pathspecs are treated as rooted at the provided comparison paths. Load the pathspecs twice, once prefixed with paths[0] and once prefixed with paths[1]. I considered trying to avoid this, but don't yet have a workable solution that reuses the same pathspec objects. We need the directory paths as prefixes otherwise the match_pathspec won't compare properly. Pass the pathspec object for both paths into queue_diff, who in-turn passes these along to read_directory_contents. Modify read_directory_contents to check against the pathspecs when scanning the directory. In order to properly recurse, we must handle leading directory checks. Thus, a new "match_leading_pathspecs" variant is created, which sets the DO_MATCH_LEADING_PATHSPEC flag. This is required to correctly handle nested directories. Consider this: $ git diff --no-index a b c/d This should include all paths in a and b which match the c/d pathspec. In particular, if there was 'a/c/d' we need to match it. But to check 'a/c/d', we need to first get to that part, which requires comparing 'a/c' first. Without DO_MATCH_LEADING_PATHSPEC, 'a/c' won't match the 'a/c/d' pathspec string. This implementation appears to work ok, but I still need to create some unit tests. I also think this might be a little hacky, and I am not sure how folks feel about parsing separate pathspecs for both entries. Some other gotchas and open questions: 1) pathspecs must all come after the first two path arguments, you can't re-arrange them to come first. I'm treating them sort of like the treeish arguments to git diff-tree. 2) The pathspecs are interpreted relative to the provided paths, and thus will always need to be specified as relative paths, and will be interpreted as relative to the root of the search for each path separately. 3) negative pathspecs have to be fully qualified from the root, i.e. ':(exclude)file' will only exclude 'a/file' and not 'a/b/file' unless you also use '(glob)' or similar. I think this matches the other pathspec support, but I an not 100% sure. I feel like some of these changes are a bit hacky and could use some further refinement or suggestion, but the core idea appears to work reasonably well! Signed-off-by: Jacob Keller <jacob.keller@xxxxxxxxx> --- This version uses pathspecs and hopefully works in a reasonable enough way to not be too confusing. Still needs tests, likely needs to be broken up into smaller pieces, or some of the hacks need to be smoothed over. I have executed this, but didn't run the full test suite yet. pathspec.h | 10 ++++++ diff-no-index.c | 85 ++++++++++++++++++++++++++++++++++++++++--------- dir.c | 24 ++++++++++++-- pathspec.c | 7 +++- setup.c | 4 ++- 5 files changed, 110 insertions(+), 20 deletions(-) diff --git a/pathspec.h b/pathspec.h index de537cff3cb6..363fb6309663 100644 --- a/pathspec.h +++ b/pathspec.h @@ -76,6 +76,11 @@ struct pathspec { * allowed, then it will automatically set for every pathspec. */ #define PATHSPEC_LITERAL_PATH (1<<6) +/* + * For git diff --no-index, indicate that we are operating without + * a repository or index. + */ +#define PATHSPEC_NO_REPOSITORY (1<<7) /** * Given command line arguments and a prefix, convert the input to @@ -184,6 +189,11 @@ int match_pathspec(struct index_state *istate, const char *name, int namelen, int prefix, char *seen, int is_dir); +int match_leading_pathspec(struct index_state *istate, + const struct pathspec *ps, + const char *name, int namelen, + int prefix, char *seen, int is_dir); + /* * Determine whether a pathspec will match only entire index entries (non-sparse * files and/or entire sparse directories). If the pathspec has the potential to diff --git a/diff-no-index.c b/diff-no-index.c index 9739b2b268b9..4535d032c27f 100644 --- a/diff-no-index.c +++ b/diff-no-index.c @@ -15,20 +15,43 @@ #include "gettext.h" #include "revision.h" #include "parse-options.h" +#include "pathspec.h" #include "string-list.h" #include "dir.h" -static int read_directory_contents(const char *path, struct string_list *list) +static int read_directory_contents(const char *path, struct string_list *list, + const struct pathspec *pathspec) { + struct strbuf match = STRBUF_INIT; + int len; DIR *dir; struct dirent *e; if (!(dir = opendir(path))) return error("Could not open directory %s", path); - while ((e = readdir_skip_dot_and_dotdot(dir))) - string_list_insert(list, e->d_name); + if (pathspec) { + strbuf_addstr(&match, path); + strbuf_complete(&match, '/'); + len = match.len; + } + while ((e = readdir_skip_dot_and_dotdot(dir))) { + if (pathspec) { + strbuf_setlen(&match, len); + strbuf_addstr(&match, e->d_name); + + if (!match_leading_pathspec(NULL, pathspec, + match.buf, match.len, + 0, NULL, + e->d_type == DT_DIR ? 1 : 0)) + continue; + } + + string_list_insert(list, e->d_name); + } + + strbuf_release(&match); closedir(dir); return 0; } @@ -131,7 +154,8 @@ static struct diff_filespec *noindex_filespec(const struct git_hash_algo *algop, } static int queue_diff(struct diff_options *o, const struct git_hash_algo *algop, - const char *name1, const char *name2, int recursing) + const char *name1, const char *name2, int recursing, + const struct pathspec *ps1, const struct pathspec *ps2) { int mode1 = 0, mode2 = 0; enum special special1 = SPECIAL_NONE, special2 = SPECIAL_NONE; @@ -171,9 +195,9 @@ static int queue_diff(struct diff_options *o, const struct git_hash_algo *algop, int i1, i2, ret = 0; size_t len1 = 0, len2 = 0; - if (name1 && read_directory_contents(name1, &p1)) + if (name1 && read_directory_contents(name1, &p1, ps1)) return -1; - if (name2 && read_directory_contents(name2, &p2)) { + if (name2 && read_directory_contents(name2, &p2, ps2)) { string_list_clear(&p1, 0); return -1; } @@ -218,7 +242,7 @@ static int queue_diff(struct diff_options *o, const struct git_hash_algo *algop, n2 = buffer2.buf; } - ret = queue_diff(o, algop, n1, n2, 1); + ret = queue_diff(o, algop, n1, n2, 1, ps1, ps2); } string_list_clear(&p1, 0); string_list_clear(&p2, 0); @@ -258,8 +282,10 @@ static void append_basename(struct strbuf *path, const char *dir, const char *fi * DWIM "diff D F" into "diff D/F F" and "diff F D" into "diff F D/F" * Note that we append the basename of F to D/, so "diff a/b/file D" * becomes "diff a/b/file D/file", not "diff a/b/file D/a/b/file". + * + * Return 1 if both paths are directories, 0 otherwise. */ -static void fixup_paths(const char **path, struct strbuf *replacement) +static int fixup_paths(const char **path, struct strbuf *replacement) { struct stat st; unsigned int isdir0 = 0, isdir1 = 0; @@ -282,25 +308,30 @@ static void fixup_paths(const char **path, struct strbuf *replacement) if ((isdir0 && ispipe1) || (ispipe0 && isdir1)) die(_("cannot compare a named pipe to a directory")); - if (isdir0 == isdir1) - return; + /* if both paths are directories, we will enable pathspecs */ + if (isdir0 && isdir1) + return 1; + if (isdir0) { append_basename(replacement, path[0], path[1]); path[0] = replacement->buf; - } else { + } else if (isdir1) { append_basename(replacement, path[1], path[0]); path[1] = replacement->buf; } + + return 0; } static const char * const diff_no_index_usage[] = { - N_("git diff --no-index [<options>] <path> <path>"), + N_("git diff --no-index [<options>] <path> <path> [<pathspec>...]"), NULL }; int diff_no_index(struct rev_info *revs, const struct git_hash_algo *algop, int implicit_no_index, int argc, const char **argv) { + struct pathspec pathspec1, pathspec2, *ps1 = NULL, *ps2 = NULL; int i, no_index; int ret = 1; const char *paths[2]; @@ -317,7 +348,7 @@ int diff_no_index(struct rev_info *revs, const struct git_hash_algo *algop, options = add_diff_options(no_index_options, &revs->diffopt); argc = parse_options(argc, argv, revs->prefix, options, diff_no_index_usage, 0); - if (argc != 2) { + if (argc < 2) { if (implicit_no_index) warning(_("Not a git repository. Use --no-index to " "compare two paths outside a working tree")); @@ -337,7 +368,27 @@ int diff_no_index(struct rev_info *revs, const struct git_hash_algo *algop, paths[i] = p; } - fixup_paths(paths, &replacement); + /* TODO: should we try to catch pathspec-like paths first and warn or + * error? We accepted those as valid 'paths' before so it seems + * unlikely we can change that behavior. + */ + if (fixup_paths(paths, &replacement)) { + parse_pathspec(&pathspec1, PATHSPEC_FROMTOP | PATHSPEC_ATTR, + PATHSPEC_PREFER_FULL | PATHSPEC_NO_REPOSITORY, + paths[0], &argv[2]); + if (pathspec1.nr) + ps1 = &pathspec1; + + parse_pathspec(&pathspec2, PATHSPEC_FROMTOP | PATHSPEC_ATTR, + PATHSPEC_PREFER_FULL | PATHSPEC_NO_REPOSITORY, + paths[1], &argv[2]); + if (pathspec2.nr) + ps2 = &pathspec2; + } else if (argc > 2) { + warning(_("Limiting comparison with pathspecs is only " + "supported if both paths are directories.")); + usage_with_options(diff_no_index_usage, options); + } revs->diffopt.skip_stat_unmatch = 1; if (!revs->diffopt.output_format) @@ -354,7 +405,7 @@ int diff_no_index(struct rev_info *revs, const struct git_hash_algo *algop, setup_diff_pager(&revs->diffopt); revs->diffopt.flags.exit_with_status = 1; - if (queue_diff(&revs->diffopt, algop, paths[0], paths[1], 0)) + if (queue_diff(&revs->diffopt, algop, paths[0], paths[1], 0, ps1, ps2)) goto out; diff_set_mnemonic_prefix(&revs->diffopt, "1/", "2/"); diffcore_std(&revs->diffopt); @@ -370,5 +421,9 @@ int diff_no_index(struct rev_info *revs, const struct git_hash_algo *algop, for (i = 0; i < ARRAY_SIZE(to_free); i++) free(to_free[i]); strbuf_release(&replacement); + if (ps1) + clear_pathspec(ps1); + if (ps2) + clear_pathspec(ps2); return ret; } diff --git a/dir.c b/dir.c index a374972b6243..7f6079475397 100644 --- a/dir.c +++ b/dir.c @@ -397,9 +397,13 @@ static int match_pathspec_item(struct index_state *istate, strncmp(item->match, name - prefix, item->prefix)) return 0; - if (item->attr_match_nr && - !match_pathspec_attrs(istate, name - prefix, namelen + prefix, item)) - return 0; + if (item->attr_match_nr) { + /* TODO: is there a better way to handle this? */ + if (!istate) + BUG("magic PATHSPEC_ATTR requires an index"); + if (!match_pathspec_attrs(istate, name - prefix, namelen + prefix, item)) + return 0; + } /* If the match was just the prefix, we matched */ if (!*match) @@ -577,6 +581,20 @@ int match_pathspec(struct index_state *istate, prefix, seen, flags); } +int match_leading_pathspec(struct index_state *istate, + const struct pathspec *ps, + const char *name, int namelen, + int prefix, char *seen, int is_dir) +{ + unsigned flags = DO_MATCH_LEADING_PATHSPEC; + + if (is_dir) + flags |= DO_MATCH_DIRECTORY; + + return match_pathspec_with_flags(istate, ps, name, namelen, + prefix, seen, flags); +} + /** * Check if a submodule is a superset of the pathspec */ diff --git a/pathspec.c b/pathspec.c index 2b4e434bc0aa..7a46da50dc77 100644 --- a/pathspec.c +++ b/pathspec.c @@ -492,7 +492,12 @@ static void init_pathspec_item(struct pathspec_item *item, unsigned flags, if (!match) { const char *hint_path; - if (!have_git_dir()) + /* TODO: should we have a different error message? + * Really, we can't have absolute paths for pathspec + * in git diff --no-index since it must be relative to + * both directories in order to work at all. + */ + if ((flags & PATHSPEC_NO_REPOSITORY) || !have_git_dir()) die(_("'%s' is outside the directory tree"), copyfrom); hint_path = repo_get_work_tree(the_repository); diff --git a/setup.c b/setup.c index f93bd6a24a5d..56ebf1218442 100644 --- a/setup.c +++ b/setup.c @@ -139,7 +139,9 @@ char *prefix_path_gently(const char *prefix, int len, return NULL; } } else { - sanitized = xstrfmt("%.*s%s", len, len ? prefix : "", path); + sanitized = xstrfmt("%.*s%s%s", len, len ? prefix : "", + len && prefix[len - 1] == '/' ? "" : "/", + path); if (remaining_prefix) *remaining_prefix = len; if (normalize_path_copy_len(sanitized, sanitized, remaining_prefix)) { -- 2.48.1.397.gec9d649cc640