Re: [GSoC RFC PATCH v2 1/7] repo-info: declare the repo-info command

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 25/07/07 08:01AM, Patrick Steinhardt wrote:
> On Fri, Jul 04, 2025 at 06:40:11PM -0300, Lucas Seiki Oshiro wrote:
> > > Would it make sense to maybe have such whole-repo commands
> > > grouped together in a `git repo` top-level command? E.g. `git repo info`
> > > for your command, `git repo size` to gather information about the repo
> > > size.
> > 
> > It seems to be very nice for me! In fact, this being a home also for
> > statistics is something I considered while writing the first versions of
> > my GSoC proposal.
> > 
> > And what about merging the two codes into a single API? Something like:
> > 
> > ```
> > git repo-info layout.bare references.format survey.commit-count
> > {
> >   "layout": {
> >     "bare": true
> >   },
> >   "references": {
> >     "format": "files"
> >   },
> >   "survey": {
> >     "commit-count": 42
> >   }
> > }
> > 
> > ?
> 
> We could in theory do that. But there's two things we need to be
> cautious about:
> 
>   1. We should be mindful about what specifically this tool is about. It
>      shouldn't become the next tool that does way too many different
>      things.
> 
>   2. One of the idea of git-survey(1) is to eventually replace
>      git-sizer(1). This will require very specific presentation formats
>      that aren't really compatible with any of the other information.
> 
> Out of these two I think the second item is the more important one why
> git-survey(1) should exist as a standalone tool, either as a top-level
> command or as a subcommand.

As Patrick mentioned, the focus for git-survey(1) is to be an eventual
substitute for git-sizer(1). For the initial implementation I was
imagining a simple plaintext format that outputs key/value pairs and
looks something like the following example:

  references.branches.count=15
  references.tags.count=2
  references.remotes.count=5
  references.others.count=1
  objects.commits.count=50
  objects.commits.total_size=1234567
  objects.commits.max_size.oid=1817dc08b8ea00fce4cd1fb6bc75713ad00a74d3
  objects.commits.max_size.size=1234
  objects.commits.max_parents.oid=1817dc08b8ea00fce4cd1fb6bc75713ad00a74d3
  objects.commits.max_parents.count=8
  objects.trees.count=100
  objects.trees.total_size=12345
  objects.trees.total_tree_entries=999
  objects.trees.max_tree_entries.oid=1817dc08b8ea00fce4cd1fb6bc75713ad00a74d3
  objects.trees.max_tree_entries.count=99
  objects.blobs.count=142
  objects.blobs.total_size=99999999
  objects.blobs.max_size.oid=1817dc08b8ea00fce4cd1fb6bc75713ad00a74d3
  objects.blobs.max_size.size=999999
  objects.tags.count=1
  repo.max_depth=999
  <etc...>

The command will also need to eventually support other output formats,
namely a more human friendly table format that provides something
similar to git-sizer(1). As layed out above, this looks like it could
also work well with the git-repo-info(1) JSON format. This makes me
wonder if we should add this functionality as a separate flag for
git-repo-info(1). Maybe something like `--stats` and append the info do
the output. If we want a more clear distiction though, we could
implement this as a separate subcommand.

For a more human-readable format, maybe we could still implement a
standalone git-survey(1) that is more of a porcelain command and uses
git-repo-info(1) under the hood. I think the other information such as
reference format and object format may be useful to provide in
git-survey(1) output.

> > During our meetings, Karthik suggested (I'm planning to it later) to also
> > allow to request an entire category instead of only the fields. Then, this
> > would also be possible:
> > 
> > ```
> > $ git repo-info survey
> > {
> >   "survey": {
> >     "commit-count": 42,
> >     "blob-count": 1234
> > }
> > ```
> 
> It raises another question though: if we ever were to add `--all` we'll
> need to step a bit careful about what kind of information we add to this
> tool. All of the information proposed so far can be computed rather
> trivially. But computing repository sizes has way higher computational
> complexity and may easily take seconds, maybe even minutes in large
> repositories.
> 
> That to me further points into the direction of giving those two tools a
> common top-level command (`git repo info`, `git repo survey`), but to
> not mix concerns too much with one another.

Getting the info for git-survey(1) is certainly more computationally
complex so there should be a way to run the command without performing
the more expensive checks if the user doesn't want them. At the same
time, I think it may be nice to have a way for a user to request a dump
of "interesting" repository info via a single command.

> > But I don't know what are Justin's plans for git-survey, if it would be a
> > porcelain command for showing those stats to the user of if it is targeted
> > for being parsed like this `repo-info`.

I think the intent for git-survey was to provide a more porcelain
command to display interesting repository stats to the user, but also
provide an option to print in a machine-parsable format. I like the idea
of computing everything as part of git-repo-info though. This could
allow a standalone git-survey to focus on just being a human-friendly
porcelain command. For scripted use-cases, users could then just use
git-repo-info.

-Justin




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux