Re: [PATCH 2/3] within_depth: fix return for empty path

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jul 29, 2025 at 08:57:43PM +0200, Toon Claes wrote:
> From: Jeff King <peff@xxxxxxxx>
> 
> The within_depth() function is used to check whether pathspecs limited
> by a max-depth parameter are acceptable. It takes a path to check, a
> maximum depth, and a "base" depth. It counts the components in the
> path (by counting slashes), adds them to the base, and compare them to

s/compare/&s/

> the maximum.
> 
> However, if the base does not have any slashes at all, we always return
> `true`. If the base depth is 0, then this is correct; no matter what the
> maximum is, we are always within it. However, if the base depth is
> greater than 0, then we might return an erroneous result.
> 
> This ends up not causing any user-visible bugs in the current code. The
> call sites in dir.c always pass a base depth of 0, so are unaffected.
> But tree_entry_interesting() uses this function differently: it will
> pass the prefix of the current entry, along with a `1` if the entry is a
> directory, in essence checking whether items inside the entry would be
> of interest. It turns out not to make a difference in behavior, but the
> reasoning is complex.
> 
> Given a tree like:
> 
>   file
>   a/file
>   a/b/file
> 
> walking the tree and calling tree_entry_interesting() will yield the
> following results:
> 
>   (with max_depth=0):
>       file: yes
>          a: yes
>     a/file: no
>        a/b: no
> 
>   (with max_depth=1):
>       file: yes
>          a: yes
>     a/file: yes
>        a/b: no
> 
> So we have inconsistent behavior in considering directories interesting.
> If they are at the edge of our depth but at the root, we will recurse
> into them, but then find all of their entries uninteresting (e.g., in
> the first case, we will look at "a" but find "a/*" uninteresting). But
> if they are at the edge of our depth and not at the root, then we will
> not recurse (in the second example, we do not even bother entering
> "a/b").
> 
> This turns out not to matter because the only caller which uses
> max-depth pathspecs is cmd_grep(), which only cares about blob entries.
> From its perspective, it is exactly the same to not recurse into a
> subtree, or to recurse and find that it contains no matching entries.
> Not recursing is merely an optimization.

Okay, well-explained.

> It is debatable whether tree_entry_interesting() should consider such an
> entry interesting. The only caller does not care if it sees the tree
> itself, and can benefit from the optimization. But if we add a
> "max-depth" limiter to regular diffs, then a diff with
> DIFF_OPT_TREE_IN_RECURSIVE would probably want to show the tree itself,
> but not what it contains.

I haven't yet read the cover letter, but I guess that the scenario where
we'll care about this is in git-last-modified(1) if we want to teach
that command a `--max-depth` parameter?

> This patch just fixes within_depth(), which means we consider such
> entries uninteresting (and makes the current caller happy). If we want
> to change that in the future, then this fix is still the correct first
> step, as the current behavior is simply inconsistent.

So... do we end up with this now?

   (with max_depth=1):
       file: yes
          a: yes
     a/file: no
        a/b: no

I think it would be nice to include that in the message to show the
change in behaviour at a glance.

> Signed-off-by: Jeff King <peff@xxxxxxxx>
> Signed-off-by: Toon Claes <toon@xxxxxxxxx>
> ---
>  dir.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/dir.c b/dir.c
> index 02873f59ea..900ee5e559 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -277,7 +277,7 @@ int within_depth(const char *name, int namelen,
>  		if (depth > max_depth)
>  			return 0;
>  	}
> -	return 1;
> +	return depth <= max_depth;
>  }

Just for my own understanding: the only difference is when we don't have
even a single matching slash, as we don't verify `depth > max_depth` in
that case. So in theory, we could modify the function to the following
equivalent:

	int within_depth(const char *name, int namelen,
				int depth, int max_depth)
	{
		const char *cp = name, *cpe = name + namelen;

		if (depth > max_depth)
			return 0;

		while (cp < cpe) {
			if (*cp++ != '/')
				continue;
			depth++;
			if (depth > max_depth)
				return 0;
		}
		return 1;
	}

(Not saying we should, I'm just double checking my understanding).

Patrick




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux