On 2025-08-17 at 20:17:46, Ron Ziroby Romero wrote: > I would like to add JSON output to the git log command. > > ## Motivation > > Machine parsing of git log output is prevalent, but git only provides > human-readable output. Having git output JSON directly solves problems > with the format option or third-party tools. Git has the information > in a machine-readable format. It should output in a machine-readable > format. JSON is ubiquitous and easy to generate, and therefore, it > makes sense to output JSON. Git provides plenty of machine-readable formats, to be clear. They're not typically structured in a standard way like JSON or CBOR, but many forges and other tools do successfully parse Git output with a variety of tools. > The author of one of the third-party tools says that JSON output is > the natural evolution of the Unix philosophy and should be done > natively for all tools[4]. > > ## Current behaviour > > Git log can output human-readable output in several ways. However, > outputting in JSON requires third-party tools or hacking pretty > output. > > ## Proposed enhancement > > Add a –pretty=json flag to output logs in JSON format. I'd like to hear how you plan to deal with non-UTF-8 byte strings since JSON must always be valid Unicode. Most data in Git is only by convention UTF-8 and can actually be in other encodings or no encoding at all: refs, commit messages[0], and author and committer idents. What would be a good idea is to add a byte string entry to the JSON writer and use it for these formats. If the data is not valid UTF-8, or if it contains a % sign, then you URL-encode it. Other encodings are possible as well, but not JSON escapes[1]. Other good options would be to use CBOR instead, since it provides native byte strings. Bad options would be to use U+FFFD, since that makes the output useless when you hit one of these cases (and I can tell you from $DAYJOB that they're not that uncommon) and to just shovel bytes into the output and let the reader be sad (which will definitely make the output useless as well as result in angry bug reports to the list). As a note, I think you want `--pretty`, not `-pretty` (we use two dashes for long options). [0] Yes, they declare an encoding, but it isn't always correct and the encoding someone used is not always available on every system. I saw someone in the Linux kernel history write "latin1", which is not a valid encoding according to Ruby, which I was using to parse it. [1] `\u00ff` represents U+00FF, which is equivalent to the byte sequence 0xc3 0xbf, not 0xff. -- brian m. carlson (they/them) Toronto, Ontario, CA
Attachment:
signature.asc
Description: PGP signature