Re: [GSOC] [Proposal v1] Machine-Readable Repository Information Query Tool

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



# Proposal for GSOC 2025 to Git
**Machine-Readable Repository Information Query Tool**

## Contact Details
* **Name**: K Jayatheerth
* **Email**: jayatheerthkulkarni2005@xxxxxxxxx
* **Blog**: [Blog](https://jayatheerthkulkarni.github.io/gsoc_blog/index.html)
* **GitHub**: [GitHub](https://github.com/jayatheerthkulkarni)

## Prerequisites & Experience


As part of the GSoC application prerequisites, I have engaged with the
Git community and initiated a microproject. This involved **updating
documentation for `MyFirstContribution.adoc` and update it with modern
codebase**, providing valuable experience with Git's codebase
structure (documentation files), the contribution workflow (patch
submission using `git send-email`, addressing feedback across
versions), and communication via the mailing list.

* **Microproject Status:** v4 submitted, incorporating feedback,
awaiting further review.
* **Microproject Patch Series:** [Main mail
thread](https://lore.kernel.org/git/xmqqa59evffd.fsf@gitster.g/T/#t)
(Link to the most relevant thread demonstrating interaction and
successful patch refinement)
* **Initial Patch:** [First
Patch](https://lore.kernel.org/git/20250312081534.75536-1-jayatheerthkulkarni2005@xxxxxxxxx/t/#u)
* **Mailing List Introduction:** [First
Mail](https://lore.kernel.org/git/CA+rGoLc69R8qgbkYQiKoc2uweDwD10mxZXYFSY8xFs5eKSRVkA@xxxxxxxxxxxxxx/t/#u)
* **Blog:** My GSoC blog details these interactions:
[Blog](https://jayatheerthkulkarni.github.io/gsoc_blog/index.html)

## **Synopsis**

This project focuses on **refactoring Git by creating a dedicated
command (tentatively named `git info`, subject to further discussion)
to house the low-level repository, path, and format-related query
options currently misplaced under the "OPTIONS FOR FILES" section of
`git-rev-parse(1)`**. This new command, potentially using a subcommand
structure (e.g., `git info path`, `git info repo`), will provide a
more logical and maintainable location for this functionality. This
allows `git rev-parse` to better focus on its core purpose of parsing
revisions, ultimately improving Git's internal organization and
command structure clarity by offering a **cleaner interface** for
these specific queries.

## **Benefits to the Community**

### **1. Improves `git rev-parse` Clarity and Maintainability**
- `git rev-parse` has accumulated various options unrelated to its
primary purpose of parsing revisions, particularly those for querying
low-level repository state and paths.
- This project **directly addresses this issue** by migrating these
options to a dedicated, purpose-built command, making `git rev-parse`
cleaner and easier to understand and maintain.
- Provides a **clearer separation of concerns** within Git's command suite.

### **2. Provides Reliable Access for Automation and Scripting**
- Scripts often need fundamental repository information like the
top-level directory path, the `.git` directory location, or repository
state.
- Currently, scripts rely on `git rev-parse` for this, invoking it for
tasks outside its core revision-parsing role.
- The new `git info` command will offer a **stable, dedicated, and
cleaner interface** for retrieving this specific low-level
information, making scripts **more robust and readable** by calling
the command designed explicitly for these tasks.



## Deliverables

This project will introduce a new Git command, **tentatively named
`git info`**, serving as the designated home for specific low-level
query options migrated from `git rev-parse`. The implementation will
likely adopt a **subcommand structure**.

The key deliverables for this GSoC project include:

1. **New Core Command: `git info` with Subcommands**
* A new `builtin/info.c` command integrated into the Git source code.
* Implementation primarily in C, using `parse-options` to handle
**subcommands** (e.g., `path`, `repo`, `misc`) and their specific
options.
* Leverages existing internal Git APIs and logic currently within `rev-parse.c`.

2. **Relocated `rev-parse` Options under Subcommands:**
* Implementation of the core functionality behind selected options
from `git-rev-parse(1)`'s "OPTIONS FOR FILES" section, organized under
appropriate subcommands within `git info`. *(Specific options and
subcommand grouping subject to final confirmation with mentor)*:
* **`git info path ...` (Example Grouping):**
* `--show-cdup` -> `git info path --cdup` (or similar)
* `--show-prefix` -> `git info path --prefix`
* `--show-toplevel` -> `git info path --toplevel`
* `--show-superproject-working-tree` -> `git info path --superproject-worktree`
* **`git info repo ...` (Example Grouping):**
* `--git-dir` -> `git info repo --git-dir`
* `--git-common-dir` -> `git info repo --common-dir`
* `--resolve-git-dir <path>` -> `git info repo --resolve-dir <path>`
* `--is-bare-repository` -> `git info repo --is-bare`
* `--is-shallow-repository` -> `git info repo --is-shallow`
* **`git info misc ...` (Example Grouping for others):**
* `--is-inside-git-dir` -> `git info misc --inside-gitdir`
* `--is-inside-work-tree` -> `git info misc --inside-worktree`
* `--shared-index-path` -> `git info misc --shared-index-path`
* *(Design Consideration):* Option names within subcommands might be
slightly adjusted for clarity/consistency (e.g., dropping "show-").

3. **Multiple Output Formats:**
* **Default Text Output:** The default output for each implemented
option will be simple, human-readable text, **matching the semantics
and format** produced by the corresponding `git rev-parse` option
(e.g., printing a path string, "true"/"false", or exiting with status
0/1 for boolean checks).
* **NUL Termination (`-z`):** Implement a `-z` option (standard across
many Git plumbing commands) for unambiguous, newline-safe output
suitable for scripting, particularly for path-related options.
* **JSON Output (`--json`):** Implement a `--json` option to provide
structured output, mapping query keys (derived from options) to their
values. This offers maximum flexibility for tools consuming the
information. *(The relative priority and implementation details of
`-z` vs `--json` to be discussed with mentor, but both are considered
core deliverables)*.

4. **Comprehensive Documentation (Incremental):**
* A clear man page (`git-info.adoc`) explaining the new command's
purpose, the subcommand structure, and detailing the usage, options
(including `-z`, `--json`), and output formats for each implemented
feature. **Relevant sections of the man page will be added or updated
within each patch series submitted.**
* Updates to `git-rev-parse.adoc` to clearly **document the
relationship** with `git info` for the migrated options (e.g., noting
that `git info` is the preferred command) and potentially marking them
for deprecation. **These updates will also be included incrementally
with relevant patch series.**

5. **Robust Test Suite (Incremental):**
* A new test script (`t/tXXXX-info.sh`) using Git's test framework
(`test-lib.sh`).
* Tests covering the subcommand structure, each implemented option,
and **all output formats** (`text`, `-z`, `--json`).
* Tests validating behavior across various repository states
(standard, bare, inside `.git`, inside worktree, submodules, shallow
clone etc.). **New tests will be added within each patch series for
the features implemented.**


## Technical Details


1. **Core `git info` Command Implementation:**
* **Entry Point:** Create `builtin/info.c` with `cmd_info(...)`. Use
`parse-options` to parse the **subcommand** first. Based on the
subcommand, invoke a specific helper function (e.g.,
`cmd_info_path()`, `cmd_info_repo()`) which then uses `parse-options`
again to handle the options specific to that subcommand.
* **Repository Context:** Standard setup using `repo` structure,
`startup_info`, and potentially `setup_git_directory_gently`.
* **Reusing Logic:** Adapt logic from `builtin/rev-parse.c` for the
core functionality of each option. This might involve direct code
migration or creating shared helper functions where appropriate.
* **Subcommand Implementation:** Implement helper functions for each
subcommand (`path`, `repo`, `misc`) containing the `parse_options`
calls and logic for the options within that group.
* **Output Generation:**
* **Text (Default):** Use `printf("%s\n", ...)` / `puts(...)` for
string output; print "true"/"false" or use `exit(0)` / `exit(1)` for
boolean checks, mimicking `rev-parse`.
* **NUL (`-z`):** Use `putchar('\0')` or `fwrite(..., 1, 1, stdout)`
instead of newline for string output when `-z` is active. Boolean
checks likely remain exit-code based.
* **JSON (`--json`):** Collect results internally. Use Git's `strbuf`
API (with `strbuf_add_json_string` etc.) or potentially an approved C
JSON library to construct and print a JSON object mapping keys to
values. All requested info within a single invocation should ideally
be combined into one JSON object.

2. **Documentation:**
* Create `Documentation/git-info.adoc`. Structure based on
subcommands. Detail each subcommand and its options, including `-z`
and `--json` behavior.
* Modify `Documentation/git-rev-parse.adoc` to add cross-references
for relevant options.
* **Documentation updates will accompany the code changes in each
patch series.**

3. **Testing:**
* Create `t/tXXXX-info.sh`.
* Use `test_expect_success` with helpers like `test_create_repo`,
`test_cmp`, `test_must_fail`.
* Add tests for:
* Correct subcommand parsing and error handling.
* Each option under its subcommand, comparing **text output** against
`rev-parse` (where applicable) or expected values/exit codes.
* **`-z` output** using appropriate comparison methods (e.g., piping
to `tr '\\0' '\\n'`).
* **`--json` output** using tools like `jq` (if available in test env)
or careful `grep`/`sed` checks for structure and values.
* **Tests will be added incrementally with the features in each patch series.**

## Detailed Project Timeline


**Phase 0: Pre-Acceptance Preparation (April 9 - May 7, 2025)**
*

**Phase 1: Community Bonding & Final Planning (May 8 - May 26, 2025 Approx.)**
* **Focus:** Formal introductions, finalize scope, agree on command
structure, setup.
* **Activities:**
* **(Week 1: May 8 - May 12):** Discuss proposal with mentor(s). Finalize:
* Command name (`git info` or alternative).
* Subcommand structure and grouping of options.
* Exact list of options to port, and any necessary renaming within subcommands.
* Approach for handling relationship with `rev-parse` (deprecation vs.
aliasing vs. simple documentation cross-link).
* Prioritization/approach for implementing `-z` and `--json`.
* **(Week 2: May 13 - May 19):** Set up dev environment. Deep dive
into agreed-upon code blocks in `rev-parse.c`. Outline
`builtin/info.c` structure including subcommand handlers. Outline
initial test script `t/tXXXX-info.sh`.
* **(Week 3: May 20 - May 26):** Implement basic `cmd_info` skeleton,
top-level subcommand parsing, repository setup. Implement one simple
subcommand handler (e.g., `cmd_info_path`) with basic option parsing
structure. Write initial "no-op" / basic structure tests. Post first
blog update.

**Phase 2: Implementation in Batches (Coding Weeks 1-8: May 27 - July
21, 2025 Approx.)**
* **Focus:** Implement options within subcommands, including
documentation and tests for text output first, then potentially add
machine-readable formats. Submit patches early and often.
* **GSoC Milestone:** Midterm Evaluations occur around Week 8.
* **Activities:** *(Structure assumes implementing text output first,
then `-z`/`--json` later in the phase)*
* **(Batch 1 / Weeks 1-2: May 27 - June 9):** Implement `path`
subcommand options (`--toplevel`, `--prefix`, `--cdup`). Implement
**text output**. Add corresponding **tests** and **documentation**
snippets (for `git-info.adoc` and `git-rev-parse.adoc`). **Submit
Patch Series 1**.
* **(Batch 2 / Weeks 3-4: June 10 - June 23):** Implement `repo`
subcommand options (`--git-dir`, `--common-dir`, `--resolve-dir`,
`--is-bare`). Implement **text output**. Add **tests** and
**documentation** snippets. **Submit Patch Series 2**. Write blog post
update.
* **(Batch 3 / Weeks 5-6: June 24 - July 7):** Implement remaining
`repo` (`--is-shallow`) and `misc` subcommand options
(`--inside-gitdir`, `--inside-worktree`, `--shared-index-path`,
`--superproject-worktree` - *adjust subcommand grouping based on final
plan*). Implement **text output**. Add **tests** and
**documentation**. **Submit Patch Series 3**.
* **(Batch 4 / Weeks 7-8: July 8 - July 21):** Implement **`-z` and
`--json` output formats** for all options added in Batches 1-3. Add
comprehensive **tests** for these formats. Update **documentation** to
fully describe `-z` and `--json` behavior. **Submit Patch Series 4**.
Prepare for Midterm evaluation; ensure submitted batches show
substantial progress on core functionality and formats. Write blog
post update.

**Phase 3: Refinement & Final Polish (Coding Weeks 9-12: July 22 - Aug
18, 2025 Approx.)**
* **Focus:** Address feedback on all patches, ensure robustness,
finalize documentation consistency.
* **Activities:**
* **(Week 9: July 22 - July 28):** Focus on addressing review comments
on **all** previous patch series (Code, Tests, Docs). Refactor based
on feedback.
* **(Week 10: July 29 - Aug 4):** Continue addressing feedback. Ensure
the test suite is robust, covers edge cases identified in reviews.
Perform thorough documentation review for consistency and clarity
across the entire man page.
* **(Week 11: Aug 5 - Aug 11):** Final code cleanup. Final pass on
test coverage. *(Stretch Goal Idea):* If all core work is stable and
time permits, potentially explore adding one or two *new*, simple,
agreed-upon repo info queries (not from `rev-parse`) that fit the
command's purpose.
* **(Week 12: Aug 12 - Aug 18):** Prepare and submit final versions of
all patch series, incorporating all feedback. Final self-testing.
Write blog post update summarizing progress and final state. Code
freeze for final evaluation.

**Phase 4: Final Evaluation & Wrap-up (Aug 19 - Nov 19, 2025)**
* Write final GSoC project summary blog post. Continue engaging with
the community in further contributions beyond GSoC.
Thank You,
Jayatheerth




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux