Junio C Hamano wrote:
Torsten Bögershausen <tboegi@xxxxxx> writes:
The word canonical has been removed.
After reading the help for
'pwd -P' and 'cd -P'
"absolute" is replaced by "physical".
A matter of taste.
If absolute is more used here in Git, I am fine with any.
It is OK as long as we are locally consistent. I do think inside
our codebase it seems we use "absolute" more, but the change in
question is about use of "-P" option, which certainly was taken from
"physical", in our test scripts, so I am OK with your description
below to use that word.
If somebody really cared (and I don't), we may want to pick a single
word among physical, absolute, and real, but the only thing is that
we are using them interchangeably, so as long as we make it clear
(e.g. perhaps strbuf_realpath() and the underlying helper functions
that are used by it may have a comment or two that says that we use
these three words interchangeably to our developers), it would be
good enough.
An "absolute" path is well-defined and commonly understood to have a
singular meaning. These paths are relative to the root directory, and are
identified by a leading separator (/). POSIX specifies this at XBD.3.2[1]
and XBD.4.16[2].
This change is not concerned with absolute paths. All of the paths in
question are absolute, both before and after this change.
There is no single term that's widely understood to have the meaning that
this change is concerned with. "Physical" is a contender, but this term is
not broadly understood without providing additional context. In fact,
while "physical" is the namesake for the pwd's `-P` option, the POSIX
definition of pwd(1) doesn't even use this term, instead explaining the
concept fully in prose[3]. I don't believe it's appropriate to discuss
"physical" paths without an introduction clarifying its intended meaning.
`realpath` is a library interface that transforms paths to those having
the semantics at issue, but it's somewhat obscure, and easily confused
with "real path" whose meaning would be entirely ambiguous. realpath(3)
documentation from POSIX[4] explains the semantics fully; glibc[5], and
Linux man-pages[6] provide full explanation while also using the term
"canonicalize".
"Canonicalize" alone is too generic, because there are several axes of
canonicalization that may apply to a path, some filesystem-dependent. This
change is concerned with canonicalization via symbolic link resolution,
but in other contexts, path canonicalization may refer to other concepts,
such as case matching on case-preserving filesystems, or encoding
canonicalization (such as Unicode normalization) on filesystems that have
defined encoding rules.
All of this illustrates the difficulty in choosing a single term to
unambiguously convey the meaning. I chose to write a commit message that
favored technical precision, even if it meant tending toward what Junio
called "the more verbose and repetitive side". I believed that to be
necessary to fully explain the background, the problem, and the solution.
I don't know that this amount of scrutiny of a commit message that's
precise and correct is entirely justified, but if there's a strong
preference to converge on a single term, provided that it's not
technically inaccurate, I'll accede to the request. I don't expect that it
will shrink the amount of explanatory text, though, because as I've
illustrated, there isn't really a term in existence that conveys this
principle generally in a way that's suitable for broad understanding.
[1] https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap03.html#tag_03_02
[2] https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap04.html#tag_04_16
[3] https://pubs.opengroup.org/onlinepubs/9799919799/utilities/pwd.html
[4] https://pubs.opengroup.org/onlinepubs/9699919799/functions/realpath.html
[5] https://sourceware.org/glibc/manual/2.41/html_node/Symbolic-Links.html#index-realpath
[6] https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/tree/man/man3/realpath.3
==================================================
t: run 7900 tests from the physical working directory
Some tests make git perform actions that produce observable pathnames,
and have expectations on those paths. Tests run with $HOME set to a
$TRASH_DIRECTORY, and with their working directory the same
$TRASH_DIRECTORY, although these paths are physical identical, they do
not observe the same pathname normalization rules and thus might not
be represented by strings that compare equal.
In particular, no pathname
normalization is applied to $TRASH_DIRECTORY or $HOME, while tests
change their working directory with `cd -P` which resolves symbolic links
returning the physical path.
"physical identical"? I think the problem is $HOME and $TRASH are
the same but not physically normalized, which means "cd $HOME &&
pwd", "cd $HOME && pwd -P" and "cd -P $HOME && pwd" can give
different results from these two variables. How about replacing the
latter half of the above with something much simpler, like this?
... although HOME and TRASH_DIRECTORY have identical values, the
physical path to it (i.e. what "cd $HOME && pwd -P" reports) may
be different.
t7900's macOS maintenance tests (which are not limited to running on
macOS) have an expectation on a path that `git maintenance` forms by
using abspath.c strbuf_realpath() to resolve the physical path
based on $HOME. When t7900 runs from a working directory that contains
symbolic links in its pathname, $HOME will also contain symbolic links,
which `git maintenance` resolves but the test's expectation does not,
causing a test failure.
Align $TRASH_DIRECTORY and $HOME with the physical path as used for
the working directory by resetting them to match the working directory
after it's established by `cd -P`. With all paths in agreement and
symbolic links resolved, pathname expectations can be set and met based
on string comparison without regard to external environmental factors
such as the presence of symbolic links in a path.