Re: [GSoC RFC PATCH v2 0/7] repo-info: add new command for retrieving repository info

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Lucas

On 23/06/2025 19:49, Lucas Seiki Oshiro wrote:

I think using an output format generated by 'printf("%s\n%s\0", key,
value)' would be easier to parse. This format matches that used by 'git
config --list -z'.

Thanks for your suggestion! However, this still breaks in the corner case
mentioned by Junio in
https://lore.kernel.org/git/xmqqikl3mtx2.fsf@gitster.g/:
when a value contains a LF, which would be possible to have in the (yet to be
implemented) path values.

The reason git uses NUL termination for other commands is to prevent breaking the output when values contain newlines. The output format I'm suggesting is

    <key><LF><value><NUL>

so the output for "path.git-dir" written as a C string would be

    "path.git-dir\n/home/phil/src/git/.git\0"

The value can safely contain newlines because it is terminated by '\0'. The reason that "git config --list -z" exists is to provide an unambiguous output format as config values can contain newlines.

I've not seen any discussion of how paths are going to be encoded in the
JSON output. As I understand it some JSON decoders only accept utf8 input
but the paths reported by git are arbitrary NUL terminated byte sequences.
How is one expected to parse the output for a non utf8 encoded path using
rust's JSON decoding for example?

[...]>
The first solution that I can think of is to check if the sequence is a valid
UTF-8 bytestring, aborting the entire command if it's not, which would be
better than just guess the charset and re-encode it as UTF-8. However,
I don't know how hard it would be to do.

I'm far from an expert but I think the normal solution is to base64 encode bytestrings so the caller can get the original bytes back. We'd need to do this for all paths. Even if we could reliably guess the encoding (which I'm not sure we can) and re-encode it as utf-8 the caller wouldn't know if the path was really utf-8 or if it had been re-encoded and they needed to convert it back to the original encoding to use it.

On the subject of paths do you plan to support the equivalent of "git
rev-parse --git-path"?

Hmmmm... In the way that it works under rev-parse, no, as it may bloat this
command with other things that aren't exactly metadata.

That's a shame as I think it we should be encouraging users to use "git rev-parse --git-path" rather than building their own paths using "git rev-parse --git-dir". The latter is easy to get wrong for example assuming the index resides at "$GIT_DIR/index" when "$GIT_INDEX_FILE" is set or running a command from a worktree and assuming the path is under "$GIT_DIR" when it actually resides under "$GIT_COMMON_DIR". If this command is going to return "$GIT_DIR" and "$GIT_WORK_TREE" then I don't see why it should not be able to provide other paths.

I'm not sure what the future plans for this command are but when I'm
scripting around git it would be nice to be able to a single process that I
could query for the things currently returned by "git rev-parse", "git var"
and "git config"

My concern here is that this main motivation for this new command is that
rev-parse has too many responsibilities. Giving too many responsibilities to
this new command may turn it into a new rev-parse and create a XKCD 927 [1]
situation

I should have been clearer that I was talking about the path and repository information options of "git rev-parse". Those combined with "git var" and "git config" are all repository settings. Having a unified interface to them would be an improvement on the status quo where users have to know which command to call to query different settings. There would be a clear focus on returning repository settings, which I think is very different from "git rev-parse" that combines revision parsing, command line parsing, shell quoting and repository information. I don't think you necessarily need to implement them as part of this project but we should design the input and output formats so that the command can be extended in the future.

Best Wishes

Phillip




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux