Re: [PATCH 3/3] diff: teach tree-diff a max-depth parameter

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jul 29, 2025 at 08:57:44PM +0200, Toon Claes wrote:
> From: Jeff King <peff@xxxxxxxx>
> 
> When you are doing a tree-diff, there are basically two options: do not
> recurse into subtrees at all, or recurse indefinitely. While most
> callers would want to always recurse and see full pathnames, some may
> want the efficiency of looking only at a particular level of the tree.
> This is currently easy to do for the top-level (just turn off
> recursion), but you cannot say "show me what changed in subdir/, but do
> not recurse".
> 
> This patch adds a max-depth parameter which is measured from the closest
> pathspec match, so that you can do:
> 
>   git log --raw --max-depth=1 -- a/b/c
> 
> and see the raw output for a/b/c/, but not those of a/b/c/d/
> (instead of the raw output you would see for a/b/c/d).

Hm, okay. So what happens if I pass both "a/b" and "a/b/c"? Would I see
the contents of both trees?

> diff --git a/Documentation/diff-options.adoc b/Documentation/diff-options.adoc
> index f3a35d8141..46e6ed2d67 100644
> --- a/Documentation/diff-options.adoc
> +++ b/Documentation/diff-options.adoc
> @@ -893,5 +893,33 @@ endif::git-format-patch[]
>  	reverted with `--ita-visible-in-index`. Both options are
>  	experimental and could be removed in future.
>  
> +--max-depth=<n>::
> +
> +	Limit diff recursion to `<n>` levels (implies `-r`). The depth
> +	is measured from the closest pathspec. Given a tree containing
> +	`foo/bar/baz`, the following list shows the matches generated by
> +	each set of options:
> ++
> +--
> + - `--max-depth=0 -- foo`: `foo`
> +
> + - `--max-depth=1 -- foo`: `foo/bar`
> +
> + - `--max-depth=1 -- foo/bar`: `foo/bar/baz`
> +
> + - `--max-depth=1 -- foo foo/bar`: `foo/bar/baz`

This partially answers my question, as we talk about the scenario where
we have "foo/bar/baz", but no "foo/qux". If we had the latter, would
that also be displayed here because it's 1 level deep from the closest
matching pathspec ("foo")?

> + - `--max-depth=2 -- foo`: `foo/bar/baz`
> +--
> ++
> +If no pathspec is given, the depth is measured as if all
> +top-level entries were specified. Note that this is different
> +than measuring from the root, in that `--max-depth=0` would
> +still return `foo`. This allows you to still limit depth while
> +asking for a subset of the top-level entries.
> ++
> +Note that this option is only supported for diffs between tree objects,
> +not against the index or working tree.

Let's also note that wildcard pathspecs aren't supported.

>  For more detailed explanation on these common options, see also
>  linkgit:gitdiffcore[7].
> diff --git a/diff-lib.c b/diff-lib.c
> index 244468dd1a..fa7c24c267 100644
> --- a/diff-lib.c
> +++ b/diff-lib.c
> @@ -115,6 +115,9 @@ void run_diff_files(struct rev_info *revs, unsigned int option)
>  	uint64_t start = getnanotime();
>  	struct index_state *istate = revs->diffopt.repo->index;
>  
> +	if (revs->diffopt.max_depth_valid)
> +		die("max-depth is not supported for worktree diffs");

This and the following error messages should be made translatable.

> diff --git a/diff.c b/diff.c
> index dca87e164f..c03a59ac3b 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -5622,6 +5625,18 @@ static int diff_opt_rotate_to(const struct option *opt, const char *arg, int uns
>  	return 0;
>  }
>  
> +static int diff_opt_max_depth(const struct option *opt,
> +			      const char *arg, int unset)
> +{
> +	struct diff_options *options = opt->value;
> +
> +	BUG_ON_OPT_NEG(unset);
> +	options->flags.recursive = 1;
> +	options->max_depth = strtol(arg, NULL, 10);

We're missing error handling in case the argument is not an integer. We
should probably use `git_parse_unsigned()` instead.

> @@ -5894,6 +5909,10 @@ struct option *add_diff_options(const struct option *opts,
>  		OPT_CALLBACK_F(0, "diff-filter", options, N_("[(A|C|D|M|R|T|U|X|B)...[*]]"),
>  			       N_("select files by diff type"),
>  			       PARSE_OPT_NONEG, diff_opt_diff_filter),
> +		OPT_CALLBACK_F(0, "max-depth", options, N_("<depth>"),
> +			       N_("maximum tree depth to recurse"),
> +			       PARSE_OPT_NONEG, diff_opt_max_depth),
> +
>  		{
>  			.type = OPTION_CALLBACK,
>  			.long_name = "output",

Okay. We don't use `OPT_UNSIGNED()` because we also want to impliy the
`recursive` flag. Wouldn't it be simpler though to use `OPT_UNSIGNED()`
and then set the flag in `diff_setup_done()` like we already do for a
couple of other options?

> diff --git a/diff.h b/diff.h
> index 62e5768a9a..e3767df237 100644
> --- a/diff.h
> +++ b/diff.h
> @@ -404,6 +404,15 @@ struct diff_options {
>  	struct strmap *additional_path_headers;
>  
>  	int no_free;
> +
> +	/*
> +	 * The extra "valid" flag is a slight hack. The value "0" is perfectly
> +	 * valid for max-depth. We would normally use -1 to indicate "not set",
> +	 * but there are many code paths which assume that assume that just

Double "assume that".

> +	 * zero-ing out a diff_options is enough to initialize it.
> +	 */
> +	int max_depth;
> +	int max_depth_valid;

So in that case, shouldn't we convert `max_depth` to be unsigned and
`max_depth_valid` to be a boolean?

> diff --git a/tree-diff.c b/tree-diff.c
> index e00fc2f450..acd302500f 100644
> --- a/tree-diff.c
> +++ b/tree-diff.c
> @@ -48,6 +49,73 @@
>  		free((x)); \
>  } while(0)
>  
> +/* Returns true if and only if "dir" is a leading directory of "path" */
> +static int is_dir_prefix(const char *path, const char *dir, int dirlen)
> +{
> +	return !strncmp(path, dir, dirlen) &&
> +		(!path[dirlen] || path[dirlen] == '/');
> +}
> +
> +static int check_recursion_depth(struct strbuf *name,

Let's mark the `name` parameter as `const` both here and in
`should_recurse()` so that it becomes clear that it shouldn't be
modified.

> +				 const struct pathspec *ps,
> +				 int max_depth)
> +{
> +	int i;
> +
> +	if (!ps->nr)
> +		return within_depth(name->buf, name->len, 1, max_depth);
> +
> +	/*
> +	 * We look through the pathspecs in reverse-sorted order, because we
> +	 * want to find the longest match first (e.g., "a/b" is better for
> +	 * checking depth than "a/b/c").
> +	 */
> +	for (i = ps->nr - 1; i >= 0; i--) {

`nr` is of type `int` indeed, so the loop index is correct even though
it feels wrong. But that's nothing we have to fix as part of this patch
series. We could inline the declaration though and make it loop-local.

> +		const struct pathspec_item *item = ps->items+i;
> +
> +		/*
> +		 * If the name to match is longer than the pathspec, then we
> +		 * are only interested if the pathspec matches and we are
> +		 * within the allowed depth.
> +		 */
> +		if (name->len >= item->len) {
> +			if (!is_dir_prefix(name->buf, item->match, item->len))
> +				continue;
> +			return within_depth(name->buf + item->len,
> +					    name->len - item->len,
> +					    1, max_depth);
> +		}
> +
> +		/*
> +		 * Otherwise, our name is shorter than the pathspec. We need to
> +		 * check if it is a prefix of the pathspec; if so, we must
> +		 * always recurse in order to process further (the resulting
> +		 * paths we find might or might not match our pathspec, but we
> +		 * cannot know until we recurse).
> +		 */
> +		if (is_dir_prefix(item->match, name->buf, name->len))
> +			return 1;
> +	}
> +	return 0;
> +}
> +
> +static int should_recurse(struct strbuf *name, struct diff_options *opt)
> +{
> +	if (!opt->flags.recursive)
> +		return 0;
> +	if (!opt->max_depth_valid)
> +		return 1;
> +
> +	/*
> +	 * We catch this during diff_setup_done, but let's double-check
> +	 * against any internal munging.
> +	 */
> +	if (opt->pathspec.has_wildcard)
> +		die("BUG: wildcard pathspecs are incompatible with max-depth");

Let's use `BUG()` instead. This patch series must be old, the function
has been introduced 8 years ago in d8193743e0 (usage.c: add BUG()
function, 2017-05-12).

Patrick




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux