Hi Lucas
On 23/06/2025 19:49, Lucas Seiki Oshiro wrote:
I think using an output format generated by 'printf("%s\n%s\0", key,
value)' would be easier to parse. This format matches that used by 'git
config --list -z'.
Thanks for your suggestion! However, this still breaks in the corner case
mentioned by Junio in
https://lore.kernel.org/git/xmqqikl3mtx2.fsf@gitster.g/:
when a value contains a LF, which would be possible to have in the (yet to be
implemented) path values.
The reason git uses NUL termination for other commands is to prevent
breaking the output when values contain newlines. The output format I'm
suggesting is
<key><LF><value><NUL>
so the output for "path.git-dir" written as a C string would be
"path.git-dir\n/home/phil/src/git/.git\0"
The value can safely contain newlines because it is terminated by '\0'.
The reason that "git config --list -z" exists is to provide an
unambiguous output format as config values can contain newlines.
I've not seen any discussion of how paths are going to be encoded in the
JSON output. As I understand it some JSON decoders only accept utf8 input
but the paths reported by git are arbitrary NUL terminated byte sequences.
How is one expected to parse the output for a non utf8 encoded path using
rust's JSON decoding for example?
[...]>
The first solution that I can think of is to check if the sequence is a valid
UTF-8 bytestring, aborting the entire command if it's not, which would be
better than just guess the charset and re-encode it as UTF-8. However,
I don't know how hard it would be to do.
I'm far from an expert but I think the normal solution is to base64
encode bytestrings so the caller can get the original bytes back. We'd
need to do this for all paths. Even if we could reliably guess the
encoding (which I'm not sure we can) and re-encode it as utf-8 the
caller wouldn't know if the path was really utf-8 or if it had been
re-encoded and they needed to convert it back to the original encoding
to use it.
On the subject of paths do you plan to support the equivalent of "git
rev-parse --git-path"?
Hmmmm... In the way that it works under rev-parse, no, as it may bloat this
command with other things that aren't exactly metadata.
That's a shame as I think it we should be encouraging users to use "git
rev-parse --git-path" rather than building their own paths using "git
rev-parse --git-dir". The latter is easy to get wrong for example
assuming the index resides at "$GIT_DIR/index" when "$GIT_INDEX_FILE" is
set or running a command from a worktree and assuming the path is under
"$GIT_DIR" when it actually resides under "$GIT_COMMON_DIR". If this
command is going to return "$GIT_DIR" and "$GIT_WORK_TREE" then I don't
see why it should not be able to provide other paths.
I'm not sure what the future plans for this command are but when I'm
scripting around git it would be nice to be able to a single process that I
could query for the things currently returned by "git rev-parse", "git var"
and "git config"
My concern here is that this main motivation for this new command is that
rev-parse has too many responsibilities. Giving too many responsibilities to
this new command may turn it into a new rev-parse and create a XKCD 927 [1]
situation
I should have been clearer that I was talking about the path and
repository information options of "git rev-parse". Those combined with
"git var" and "git config" are all repository settings. Having a unified
interface to them would be an improvement on the status quo where users
have to know which command to call to query different settings. There
would be a clear focus on returning repository settings, which I think
is very different from "git rev-parse" that combines revision parsing,
command line parsing, shell quoting and repository information. I don't
think you necessarily need to implement them as part of this project but
we should design the input and output formats so that the command can be
extended in the future.
Best Wishes
Phillip