Re: [GSOC PROPOSAL 2025] Machine-Readable Repository Information Query Tool

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



MOUMITA DHAR <dhar61595@xxxxxxxxx> writes:

Hello Moumita,

> Hello ,
> This is my GSOC proposal
> The doc version -
> https://docs.google.com/document/d/1f1npZ7Ye-FOZENkfaR4SR2TrgSXnNlI8hvC2T0hJA_Y/edit?usp=sharing
>
> # Proposal for GSOC 2025
>
> ## Project \- Machine-Readable Repository Information Query Tool
>
> ## Personal information
>
> Name \- Moumita Dhar
> Email \- [dhar61595@xxxxxxxxx](mailto:dhar61595@xxxxxxxxx)
> Github \- [https://github.com/Mou887](https://github.com/Mou887)
> LinkedIn \- [https://www.linkedin.com/in/moumita-dhar-234940253/](https://www.linkedin.com/in/moumita-dhar-234940253/)
>
> ## About me
>
> I’m a self-taught programmer who began my coding journey in 2022\. I
> got started by taking [CS50: Introduction to Computer
> Science](https://cs50.harvard.edu/x/) by Harvard University on edX,
> which sparked my curiosity about how software really works. Since
> then, I’ve been learning independently, completing several follow-up
> CS50 courses like **CS50’s Web Programming**, and **CS50’s
> Understanding Technology**, to build a strong foundation in computer
> science and software development.
>
> While I hold a university degree, my academic background is **not in
> computer science**. However, I have consistently dedicated my time and
> energy to learning programming concepts, tools, and real-world
> development workflows on my own. I’m passionate about systems
> programming, developer tools, and contributing to meaningful
> open-source projects.
>

The rules [1] don't mention anything about requiring a *computer
science*, so all participants are welcome!

> I’m participating in GSoC under the **Open Source Beginner** category.
> Even though I’m not currently a student, GSoC represents a unique
> opportunity for me to gain valuable mentorship and experience in
> large-scale software collaboration, while contributing to a project I
> deeply care about.
>
> Outside of coursework and learning, I’ve also explored Git’s internals
> through personal projects and patches, and I’m excited to take this
> further through GSoC.
>
> ## Microproject
>
> Status \- Under discussion
>
> Mail thread \- [https://lore.kernel.org/git/20250330134018.9662-2-dhar61595@xxxxxxxxx/](https://lore.kernel.org/git/20250330134018.9662-2-dhar61595@xxxxxxxxx/)
>
> Description \-  I contributed to Git’s `userdiff` system by enhancing
> syntax detection for shell scripts. I focused on improving how Git
> highlights and navigates function definitions and words in Bash
> scripts during diffs. I have iterated over four patch versions based
> on reviewer feedback.
>
> ## Project Overview: Decluttering `git rev-parse`
>
> The core purpose of the command was to \-
> **Parse revision identifiers** like `HEAD`, `master~2`, `origin/HEAD`, or tags.
>
> **Convert symbolic references** into full 40-character commit hashes.
>
> **Resolve user input** into unambiguous commit IDs for internal use.
>
> Over time, developers began adding utility options to `git rev-parse`
> that had **nothing to do with parsing revisions**, such as:
>
> * `--is-bare-repository`
>
> * `--git-dir`
>
> * `--show-toplevel`
>
> * `--is-inside-work-tree`
>
> This project aims to:
>
> 1. **Extract non-revision-parsing functionality from `git rev-parse`.**
>
>       2\.    **Create a new structured command** (e.g., `git
> repo-info`) dedicated to:
>
> *   Repository paths and environment
> *   Status checks
> *   Format queries
> *  Superproject relationships
> * Git environment variables
>

Makes sense, like I mentioned on another proposal [2], it would be nice
to mention that everything under 'Options for Files' section of the 'git
rev-parse' manpage probably needs a new home.

I also think you should elaborate on how the new command would look
like, will we simply copy over the options? Will there be better
consistent naming? What would the default output for 'git repo-info' be?
Also how do you justify the name? Is it consistent with the command
names in Git? Is it self-explanatory?

It would also be nice to write a brief about how you plan to tackle
this, not from a timeline perspective but from a technical perspective.

> ## Project Timeline
>
> ### Community Bonding Period(Before June 2\)
>
> * Finalize the scope and confirm overall design with mentors.
>
> * Settle on command name (e.g., `git-info`, `git-meta`) and structure.
>

I would suggest involving the mailing list as soon as possible, as you'd
get some good feedback around the early design.

> * Review how `git-rev-parse` implements the related options.
>
> * Draft the expected JSON output format for each functionality area.
>
> ### Week 1 (June 2–8): Repository Path Information
>
> * .Implement logic to report on repository layout and paths:
>
>    `.git` directory, common directory, top-level path, relative and
> absolute paths, etc.
>
>     Related options: `--git-dir`, `--git-common-dir`, `--git-path`,
> `--show-toplevel`,       `--show-cdup`, `--show-prefix`,
> `--absolute-git-dir`
>

Nice, I like that the project is broken down into smaller modules.

> *     Introduce new command skeleton and first subcommand infrastructure.
>
> * Output structured data (e.g., JSON).

How do you plan to tackle this? Have you taken a look at json-writer.[c,h]?

>
> * Write an initial test suite and begin documentation.
>
> ### Week 2 (June 9–15): Git Environment Context
>
> *  Handle environment reporting:-
>
>       List Git-relevant environment variables (e.g., `GIT_DIR`,
> `GIT_WORK_TREE`, etc.)
>
>       Related option: `--local-env-vars`
>
> *  Ensure the output is shell-safe and informative for scripting use.
>
> * Write tests covering multiple shell environments.
>
> * Finalize docs and polish previous week’s code based on mentor feedback.
>

I think this is a good point. Generally things take long, since we need
to sync with the mailing list and ensure it is upto a good standard.
Then the topic will slowly move from seen -> next -> master.

> ### **Week 3 (June 16–22): Repository State and Status**
>
> * Implement checks for current repo state:-
>
>
>                If the repo is a bare repo, shallow clone,  inside
> `.git` or working tree.
>
>    Related options: `--is-bare-repository`, `--is-shallow-repository`,
>    `--is-inside-git-dir`, `--is-inside-work-tree`
>
> * Add structured output with booleans for each status.
>
> * Test across various repo types (bare, shallow, normal).
>
> * Document usage and update test coverage.
>
> ### Week 4 (June 23–29): Object and Ref Format Reporting
>
> * Report the object format and reference storage format used:-
>
>        SHA-1/SHA-256, loose or reftable, etc.
>
>        Related options: `--show-object-format`, `--show-ref-format.`
>
> *  Ensure fallback behavior works for older Git versions or partial
> configurations.
>
> *  Add comprehensive tests and documentation for this area.
>
> ### Week 5 (June 30–July 6): Review & Midterm Prep
>
> * Integrate feedback on the previous four areas.
>
> * Finalize documentation and tests.
>
> * Clean up patch series.
>
> * Run full test suite and verify output consistency.
>
> * Prepare for **midterm submission**.
>
> ### Week 6 (July 7–13): Superproject Awareness
>
> *  Implement logic to determine whether the current repo is inside a
> superproject:-
>
>
>                 Show the outer working tree if present.
>
>                 Related option: `--show-superproject-working-tree`
>
> *     Handle edge cases where repo is not a submodule.
>
> *     Write test coverage and update documentation accordingly.
>
> ### Week 7 (July 14–20): Path Resolution Logic
>
> * Add functionality to resolve Git-related paths:
>
>
>                 Handle symlinks, relative paths, and `.git` indirection.
>
>                  Related option: `--resolve-git-dir`
>
> * Focus on correctness and compatibility.
>
> * Add comprehensive tests (symlinks, embedded repos, relative vs absolute).
>
> * Document clearly.
>
> ### Week 8 (July 21–27): Code Review & Integration
>
>
>
> * Submit patch series for areas from Weeks 6–7.
>
> * Begin integrating all subcommands into a consistent command structure.
>

Could you expand on what you mean here?

> * Ensure consistent JSON schema and error handling.
>

And here.

> * Begin polish and unification.
>
> ### Week 9 (July 28–Aug 3): Unified Output and CLI Polish
>
> * Implement a top-level dispatcher for all functionality areas.
>

This too, what is a dispatcher in context to our codebase?

> * Add `--format=json` or similar flags for consistent CLI interface.
>
> * Write integration tests across all supported repo states.
>

Aren't tests covered as part of each batch of work? What extra do these
tests add, why aren't they part of the initial tests?

> * Run  full test suite in clean and dirty trees.
>

This should be part of each batch no?

> ### Week 10 (Aug 4–10): Final Documentation and Usability
>
> *  Write a complete manpage for the new command.
>

I would say each patch should hold corresponding documentation, it is
not something we want to work on at the end. We don't want a project
left midway without _any_ documentation. it'd be better if there is
sufficient documentation added for each new block of changes, that way
the state of the project is not lacking at any point. So code, tests,
documentation should all be part of each block of work you do.

> * Add real-world examples and shell usage patterns.
>
> * Run `check-docs`, validate formatting and help output.
>
>
> ### Week 11 (Aug 11–17): Final Mentor Review and Bugfixes
>
> * Submit a full final patch series.
>
> * Incorporate the last round of mentor feedback.
>

I think this too, is part of each step of the process.

> * Clean up commit messages and inline comments.
>
> * Final CI runs and Git project best practices review.
>
> ###  Week 12 (Aug 18–24): Submission and Wrap-Up
>
> * Submit final work to Git mailing list (if not already).
>
> * Complete final report, blog post, and GSoC submission.
>
> * Add final tests or polish based on review feedback.
>
> ### Final Week
>
> * Reserved for unforeseen delays or last-minute polish.
>

Overall, it seems like we're building up to the end for a big patch
series in the end. The recommended route would be to split the work into
small chunks and get each chunk through one at a time. Each chunk would
contain necessary code, tests, documentation and should be in a state
where it can be merged to the maintree.

> ### Time period from April 9 to  May 6
>
> During this period, I plan to work on a **practice patch** based on my
> current understanding of the project. This will help me evaluate how
> well I can implement the ideas outlined in my proposal and whether the
> timeline I’ve suggested is realistic.
>
> This preparatory work will allow me to:
>
> * Explore the relevant parts of the codebase in more depth
>
> * Validate my implementation approach with a small, isolated prototype
>
> * Build confidence in handling Git’s development workflow
> (compilation, testing, patch submission, etc.)
>
> I understand that official coding for GSoC begins in June, and I will
> reserve actual patch submissions for that period, in accordance with
> GSoC guidelines. The goal of this exercise is solely to prepare myself
> to contribute effectively and responsibly from day one.
>
> ## Blogging
>
> I will maintain a blog  to document my progress, challenges, and
> learnings throughout the program. This will serve both as a personal
> reflection and a way to give back to the community by helping future
> contributors understand the development process within Git. I will
> post regular updates—starting from the community bonding period
> through to the final evaluation—covering details like subcommand
> implementation, testing strategies, mailing list interactions, and
> reviews.
> My blog :- [https://hashnode.com/@Moumita](https://hashnode.com/@Moumita)
>
> ## Post GSOC
>
> My involvement with Git will not end with the GSoC coding period. I
> intend to continue contributing to the Git project even after GSoC
> concludes by following up on any remaining feedback related to my
> project, further refining and expanding the new command as needed, and
> actively participating in the community through patch reviews and
> mailing list discussions. I also plan to explore and work on other
> issues or features in the Git codebase that align with my interests.
> Through GSoC, I hope to establish myself as a long-term contributor to
> Git. I see this project not just as a summer commitment, but as the
> start of a deeper and ongoing engagement with the Git project and the
> broader open source community.
>
> ## Availability
>
> I am fully available for GSoC and can dedicate **approximately 8 hours
> per day, 7 days a week**, which totals to **about 50–56 hours per
> week**. I do not have any academic or job commitments during the GSoC
> period and can devote my full attention to the project.
>
> This flexibility allows me to accommodate feedback, mentor
> communication, code reviews, and unexpected blockers without falling
> behind on the proposed timeline. I'm also willing to adjust my
> schedule if needed to better sync with my mentor’s availability or
> project needs.

Thanks for the proposal!

- Karthik


[1]: https://summerofcode.withgoogle.com/rules
[2]: https://lore.kernel.org/git/CAOLa=ZQSnwSPw1U_-2YZzjK5z_jUEB3vGy=So5e+gpOa87Ei=w@xxxxxxxxxxxxxx/T/#mc4c5c87594cd2e0ea795259a6868b3494781cf86

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux