Re: [GSOC] [Proposal v1] Machine-Readable Repository Information Query Tool

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Apr 3, 2025 at 3:53 PM Patrick Steinhardt <ps@xxxxxx> wrote:
>
> On Mon, Mar 31, 2025 at 08:21:27PM +0530, JAYATHEERTH K wrote:
> > ## **Synopsis**
> > This project aims to develop a dedicated Git command that interfaces
> > with Git’s internal APIs to produce structured JSON output,
> > particularly for repository metadata. By offering a clean,
> > machine-readable format, this tool will improve automation, scripting,
> > and integration with other developer tools.
> >
> > ## **Benefits to the Community**
> > ### **1. Simplifies Automation and Scripting**
> > - Many Git commands output **human-readable text**, making automation
> > **error-prone** and **dependent on fragile parsing**.
> > - This project introduces **structured JSON output**, allowing scripts
> > and tools to consume repository metadata **directly and reliably**.
> > - No more **awkward text parsing**, `grep` hacks, or brittle `awk/sed`
> > pipelines—just **clean, structured data**.
> >
> > ### **2. Eliminates the Overuse of `git rev-parse`**
> > - `git rev-parse` is widely misused for extracting metadata, despite
> > being intended primarily for **parsing revisions**.
> > - Developers often **repurpose** it because there’s **no dedicated
> > alternative** for metadata queries.
> > - This project **corrects that gap** by introducing a **purpose-built
> > command** that is **cleaner, more intuitive, and extensible**.
> >
> > ### **3. Optimizes CI/CD Pipelines**
> > - CI/CD systems currently need **multiple Git commands** and
> > associated parsing logic to fetch basic metadata:
> >
> > ```bash
> > # Example: Gathering just a few common pieces of info
> > BRANCH=$(git rev-parse --abbrev-ref HEAD 2>/dev/null || echo "DETACHED")
> > COMMIT=$(git rev-parse HEAD)
> > REMOTE_URL=$(git remote get-url origin 2>/dev/null || echo "no-origin")
> > # ... often requiring more commands and error handling logic.
> > ```
> > - The proposed command aims to **replace these multiple calls** with a
> > **single, efficient query** returning comprehensive, structured JSON
> > data.
> > - This **simplifies pipeline scripts**, reduces process overhead, and
> > makes CI/CD configurations **cleaner and more robust**.
>
> I already saw this in another proposal, which indicates that the project
> idea might be a bit underspecced. In any case, the goal of the project

Hey Patrick, thank you for letting me know
I actually have been working on this proposal for a while now.
I also sent an e-mail regarding this specific project right before
GSOC proposals started. As far as I can see this project was not
previously discussed therefore I picked this.

https://lore.kernel.org/git/CA+rGoLdvY+JdgdzgE04EJoF9KGUpd39+2S_AgpFyucP38mdFgA@xxxxxxxxxxxxxx/

I'm not sure how to proceed in this situation. I think I need some
advice from your side on this.

> isn't to write a single tool that is able to surface _all_ information
> for a Git repository. It's rather that we want to surface low-level
> information around the repository itself.
>
> The basic intent is to give the options listed in git-rev-list(1) under
> the section "Options for Files" a better home. We have a bunch of
> command line options there that allow us to parse environment variables,
> paths, repository formats and other low-level stuff. But these aren't
> really a good fit for git-rev-parse(1) itself because that tool was
> intended to be about parsing revisions. So this is one of those
> organically grown commands that has started to accumulate all kinds of
> unrelated options that didn't have a better home elswhere.
>

Ok that clears a lot of things.

> So the scope of the project is somewhat more limited compared to what
> you propose here. As that impacts a lot of the implementation details as
> well as the project timeline I'm not going to comment on these now.
>

I think some parts of this proposal still hold scope like the CJSON
discussion part, repository details etc, into this revised plan, but I
think I will send a revised proposal covering the changes in detail.

> > ## Detailed Project Timeline
> >
> >
> > **Phase 0: Pre-Acceptance Preparation (April 9 - May 7, 2025)**
> >
> > * **Focus:** Demonstrate continued interest and deepen understanding
> > while awaiting results.
> > * **Official GSoC Milestone:** April 8, 2025 - Proposal Deadline.
> > * **Activities:**
> > * **(April 9 - April 21):** Deep dive into Git's source code
> > structure, focusing specifically on areas identified in the proposal's
> > Technical Details:
> > * `builtin/` directory structure and command handling.
> > * `repository.h`, `refs.h`, `remote.h`, `config.c`, `strbuf.h`.
> > * How existing commands like `git status`, `git branch`, `git
> > rev-parse`, `git remote -v` access underlying data.
> > * **(April 22 - May 7):**
> > * Monitor the Git mailing list for discussions related to repository
> > information, command output formats, or JSON usage.
> > * Refine understanding of Git's testing framework as I've not done a
> > deep dive into tests(`t/test-lib.sh`). Try running and understanding
> > existing tests relevant to refs, remotes, or configuration.
> > * Review Git's contribution guidelines (`SubmittingPatches`, coding
> > style) again since most of my microproject time was related to
> > documentation.
> > * Try to start some more microprojects or actively converse in other patches.
>
> Note that microprojects are supposed to be finished before submitting
> your proposal. They are used for us mentors to figure out whether
> candidates would be a good fit or not. So ideally, you would prominently
> link to one or more of your finished microprojects in the proposal
> itself already.
>

I see you've noticed below that I've been active in a microproject. I
will move it up and make it more noticable thank you for pointing it
out!!

> > **Phase 4: Documentation, Polish & Stretch Goals (Coding Weeks 9-12:
> > July 22 - Aug 18, 2025 Approx.)**
> >
> > * **Focus:** Finalize documentation, implement error handling, address
> > feedback, attempt stretch goals if feasible.
> > * **Activities:**
> > * **(Week 9: July 22 - July 28):** Complete the first draft of the man
> > page, detailing usage, JSON schema, and options. Implement the
> > `--json-errors` functionality for structured error reporting. Add
> > tests for error cases.
> > * **(Week 10: July 29 - Aug 4):** *Begin Stretch Goals (Conditional):*
> > If core work is stable and time permits, start implementing
> > `--head-only` / `--remotes-only` flags or the basic `is_dirty` check.
> > Add tests for any implemented stretch goals.
> > * **(Week 11: Aug 5 - Aug 11):** Thorough code cleanup, address all
> > outstanding review comments on submitted patches. Ensure documentation
> > is comprehensive and accurate. Final pass on test suite coverage.
> > * **(Week 12: Aug 12 - Aug 18):** Prepare and submit final patches
> > incorporating documentation, error handling, and any completed stretch
> > goals. Final code freeze for GSoC evaluation purposes. Write blog post
> > update summarizing final phase.
>
> One thing that I also mentioned to others: instead of planning for one
> big batch of load, I would strongly recommend to plan your work in
> smaller batches. You should ideally have multiple self-contained batches
> of work that you can submit as early as possible while still bringing
> some value to the project. This ensures that you can get feedback from
> the bigger community early on.
>

Ok, so I will reshape my timeline in a way where I specify my patches
while converging them to a bigger project at the end.

> > ## Past Communication and Microproject
> > * **Blog**: [Blog](https://jayatheerthkulkarni.github.io/gsoc_blog/index.html)
> > This blog contains a detailed communication description and blog of my
> > microproject experience.
> > * First Introduction to the Git Mailing list: [first
> > Mail](https://lore.kernel.org/git/CA+rGoLc69R8qgbkYQiKoc2uweDwD10mxZXYFSY8xFs5eKSRVkA@xxxxxxxxxxxxxx/t/#u)
> > * First patch to the git mailing list: [First
> > Patch](https://lore.kernel.org/git/20250312081534.75536-1-jayatheerthkulkarni2005@xxxxxxxxx/t/#u)
> > * Most recent series of patches and back and forth with feedbacks:
> > [Main mail thread](https://lore.kernel.org/git/xmqqa59evffd.fsf@gitster.g/T/#t)
> >
> > I've been maintaing the blog and will maintain the blogs of all the
> > communication of mine to the git mailing list.
>
> ah, you do have a microproject. As this is part of the prerequisites I
> would like to propose to have this more prominently visible.
>
> Thanks!
>

Thanks again this helps a lot.

> Patrick

Thank you,
Jay





[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux