Re: [RFE] Add JSON output to git log commands

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2025-08-17 at 20:17:46, Ron Ziroby Romero wrote:
> I would like to add JSON output to the git log command.
> 
> ## Motivation
> 
> Machine parsing of git log output is prevalent, but git only provides
> human-readable output. Having git output JSON directly solves problems
> with the format option or third-party tools. Git has the information
> in a machine-readable format. It should output in a machine-readable
> format. JSON is ubiquitous and easy to generate, and therefore, it
> makes sense to output JSON.

Git provides plenty of machine-readable formats, to be clear.  They're
not typically structured in a standard way like JSON or CBOR, but many
forges and other tools do successfully parse Git output with a variety
of tools.

> The author of one of the third-party tools says that JSON output is
> the natural evolution of the Unix philosophy and should be done
> natively for all tools[4].
> 
> ## Current behaviour
> 
> Git log can output human-readable output in several ways. However,
> outputting in JSON requires third-party tools or hacking pretty
> output.
> 
> ## Proposed enhancement
> 
> Add a –pretty=json flag to output logs in JSON format.

I'd like to hear how you plan to deal with non-UTF-8 byte strings since
JSON must always be valid Unicode.  Most data in Git is only by
convention UTF-8 and can actually be in other encodings or no encoding
at all: refs, commit messages[0], and author and committer idents.

What would be a good idea is to add a byte string entry to the JSON
writer and use it for these formats.  If the data is not valid UTF-8, or
if it contains a % sign, then you URL-encode it.  Other encodings are
possible as well, but not JSON escapes[1].

Other good options would be to use CBOR instead, since it provides
native byte strings.

Bad options would be to use U+FFFD, since that makes the output useless
when you hit one of these cases (and I can tell you from $DAYJOB that
they're not that uncommon) and to just shovel bytes into the output and
let the reader be sad (which will definitely make the output useless as
well as result in angry bug reports to the list).

As a note, I think you want `--pretty`, not `-pretty` (we use two dashes
for long options).

[0] Yes, they declare an encoding, but it isn't always correct and the
encoding someone used is not always available on every system.  I saw
someone in the Linux kernel history write "latin1", which is not a valid
encoding according to Ruby, which I was using to parse it.
[1] `\u00ff` represents U+00FF, which is equivalent to the byte sequence
0xc3 0xbf, not 0xff.
-- 
brian m. carlson (they/them)
Toronto, Ontario, CA

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux