Re: [GSoC RFC PATCH v2 0/7] repo-info: add new command for retrieving repository info

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> Hi Lucas

Hi, Phillip, thanks for joining this discussion!

> I think using an output format generated by 'printf("%s\n%s\0", key,
> value)' would be easier to parse. This format matches that used by 'git
> config --list -z'.

Thanks for your suggestion! However, this still breaks in the corner case 
mentioned by Junio in 
https://lore.kernel.org/git/xmqqikl3mtx2.fsf@gitster.g/:
when a value contains a LF, which would be possible to have in the (yet to be
implemented) path values.

> I've not seen any discussion of how paths are going to be encoded in the
> JSON output. As I understand it some JSON decoders only accept utf8 input
> but the paths reported by git are arbitrary NUL terminated byte sequences.
> How is one expected to parse the output for a non utf8 encoded path using
> rust's JSON decoding for example?

By now, I'm directly using the jw_* functions, which format strings using the
function append_quoted_string, introduced in 75459410ed (json_writer: new
routines to create JSON data, 2018-07-13). It was also discussed when that
function was introduced:

"""
    We say "JSON-like" because we do not enforce the Unicode (usually UTF-8)
    requirement on string fields.  Internally, Git does not necessarily have
    Unicode/UTF-8 data for most fields, so it is currently unclear the best
    way to enforce that requirement.  For example, on Linux pathnames can
    contain arbitrary 8-bit character data, so a command like "status" would
    not know how to encode the reported pathnames.  We may want to revisit
    this (or double encode such strings) in the future.
"""

So, it looks like that "the future" is soon :-). In this RFC, I'm not handling
paths yet, and I can't propose a proper solution by now as I honestly know
very little about UTF-8 encoding... 

The first solution that I can think of is to check if the sequence is a valid
UTF-8 bytestring, aborting the entire command if it's not, which would be
better than just guess the charset and re-encode it as UTF-8. However,
I don't know how hard it would be to do.

> On the subject of paths do you plan to support the equivalent of "git
> rev-parse --git-path"?

Hmmmm... In the way that it works under rev-parse, no, as it may bloat this
command with other things that aren't exactly metadata.

> I'm not sure what the future plans for this command are but when I'm
> scripting around git it would be nice to be able to a single process that I
> could query for the things currently returned by "git rev-parse", "git var"
> and "git config"

My concern here is that this main motivation for this new command is that
rev-parse has too many responsibilities. Giving too many responsibilities to
this new command may turn it into a new rev-parse and create a XKCD 927 [1]
situation

> 
> Best Wishes
> 
> Phillip
> 

Thanks again for bringing more light to this discussion! These first patches
are only outputting hardcoded strings from Git, and dealing with Unicode is
something that I'll really need to think about how to solve.

[1] https://xkcd.com/927/




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux