Re: [GSOC] [Proposal v1] Machine-Readable Repository Information Query Tool

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



# Proposal for GSOC 2025 to Git
**Refactoring `git rev-parse`: A Dedicated Command for Repository Information**

## Contact Details
* **Name**: K Jayatheerth
* **Email**: jayatheerthkulkarni2005@xxxxxxxxx
* **Blog**: [Blog](https://jayatheerthkulkarni.github.io/gsoc_blog/index.html)
* **GitHub**: [GitHub](https://github.com/jayatheerthkulkarni)

## Prerequisites & Experience


As part of the GSoC application prerequisites, I have engaged with the
Git community with a microproject involving documentation changes.
This provided valuable experience with Git's codebase, contribution
workflow (patch submission, feedback cycles), and communication via
the mailing list.

* **Microproject Patch Series:** [Main mail
thread](https://lore.kernel.org/git/xmqqa59evffd.fsf@gitster.g/T/#t)
(Link to the most relevant thread demonstrating interaction and
successful patch refinement)
* **Initial Patch:** [First
Patch](https://lore.kernel.org/git/20250312081534.75536-1-jayatheerthkulkarni2005@xxxxxxxxx/t/#u)
* **Mailing List Introduction:** [First
Mail](https://lore.kernel.org/git/CA+rGoLc69R8qgbkYQiKoc2uweDwD10mxZXYFSY8xFs5eKSRVkA@xxxxxxxxxxxxxx/t/#u)
* **Blog:** My GSoC blog details these interactions:
[Blog](https://jayatheerthkulkarni.github.io/gsoc_blog/index.html)

## **Synopsis**

This project focuses on **refactoring Git by creating a dedicated
command (tentatively named `git repo-info`) to house the low-level
repository, path, and format-related query options currently misplaced
under the "OPTIONS FOR FILES" section of `git-rev-parse(1)`**. This
new command will provide a more logical and maintainable location for
this functionality, allowing `git rev-parse` to better focus on its
core purpose of parsing revisions, thus improving Git's internal
organization and command structure clarity.

## **Benefits to the Community**

### **1. Improves `git rev-parse` Clarity and Maintainability**
- `git rev-parse` has accumulated various options unrelated to its
primary purpose of parsing revisions, particularly those for querying
low-level repository state and paths.
- This project **directly addresses this issue** by migrating these
options to a dedicated command, making `git rev-parse` cleaner and
easier to understand and maintain.
- Provides a **clearer separation of concerns** within Git's command suite.

### **2. Provides Reliable Access for Automation and Scripting**
- Scripts often need fundamental repository information like the
top-level directory path (`--show-toplevel`), the `.git` directory
location (`--git-dir`), or repository state (`--is-bare-repository`).
- Currently, scripts rely on `git rev-parse` for this, mixing
low-level repo queries with revision parsing calls.
- The new `git repo-info` command will offer a **stable, dedicated
interface** for retrieving this specific low-level information, making
scripts **cleaner and more robust** by calling the command designed
explicitly for these tasks.
- The default output will mimic the **existing, simple text format**
of the `rev-parse` options, ensuring compatibility for scripts
migrating to the new command.

### **3. Enhances CI/CD Pipeline Foundations**
- CI/CD pipelines frequently need to establish context by determining
the repository root or `.git` directory location early in their
execution.
- Using the dedicated `git repo-info` command for these foundational
queries **simplifies the initial setup steps** in pipeline scripts
compared to using the overloaded `git rev-parse`.

## Deliverables

Acknowledging the needs that the project scope is focused on
refactoring `git rev-parse`, this project will introduce a new Git
command, tentatively named `git repo-info`, serving as the designated
home for specific low-level query options.

The key deliverables for this GSoC project include:

1. **New Core Command: `git repo-info`**
* A new `builtin/repo-info.c` command integrated into the Git source code.
* Implementation primarily in C, leveraging existing internal Git APIs
and logic currently within `rev-parse.c` to implement the relocated
options.

2. **Relocated `rev-parse` Options:**
* Implementation of the core functionality behind the following
options from `git-rev-parse(1)`'s "OPTIONS FOR FILES" section within
the new `git repo-info` command:
* **Path Queries:** `--show-cdup`, `--show-prefix`, `--show-toplevel`,
`--show-superproject-working-tree`
* **Directory Queries:** `--git-dir`, `--git-common-dir`,
`--resolve-git-dir <path>`
* **State/Format Queries:** `--is-inside-git-dir`,
`--is-inside-work-tree`, `--is-bare-repository`,
`--is-shallow-repository`
* **Index File Query:** `--shared-index-path`

3. **Default Output Format (Text-Based):**
* The command's default output for each implemented option will
**match the current plain text output** produced by `git rev-parse`
for that same option, ensuring backward compatibility for scripts
migrating to the new command. Output will primarily be via standard C
functions like `printf` or `puts`.

4. **Comprehensive Documentation:**
* A clear man page (`git-repo-info.adoc`) explaining the new command's
purpose and detailing the usage and output of each implemented option.
* Updates to `git-rev-parse.adoc` to clearly **deprecate** the
relocated options (or mark them as aliases for compatibility) and
point users to the new `git repo-info` command.

5. **Robust Test Suite:**
* A new test script (`t/tXXXX-repo-info.sh`) using Git's test
framework (`test-lib.sh`).
* Tests specifically validating the output of `git repo-info --option`
against the output of `git rev-parse --option` across various
repository states (standard repo, bare repo, inside `.git`, inside
worktree, submodules, shallow clone etc.) to ensure functional parity.

6. **(Stretch Goal / Potential Future Work): Structured Output**
* If time permits after successfully implementing, documenting, and
testing the core text-based functionality, investigate adding a
`--format=json` option to provide a structured JSON output containing
the results of the requested queries. This is explicitly a secondary
goal, contingent on completing the primary refactoring task.

**Out of Scope for GSoC (Based on Refined Goal):**
* Querying high-level metadata like current branch name, HEAD commit
details (beyond `--is-shallow-repository`), remote URLs, tags, or
arbitrary configuration values.
* Complex status reporting (worktree dirtiness).
* Real-time monitoring or comparing metadata between revisions.
* Implementing JSON output as the *primary* feature.

## Technical Details

This section outlines the proposed technical approach for implementing
the `git repo-info` command and relocating the specified options:

1. **Core `git repo-info` Command Implementation:**
* **Entry Point:** Create `builtin/repo-info.c` with
`cmd_repo_info(...)` function. Parse options using Git's
`parse-options` API.
* **Repository Context:** Utilize the standard `repo` structure and
`startup_info` provided by Git's infrastructure. Setup the repository
context similar to how `cmd_rev_parse` does it if needed (e.g., using
`setup_git_directory_gently`).
* **Reusing Logic:** Analyze the implementation of the target options
within `builtin/rev-parse.c`. Extract and adapt the relevant C
functions and logic (related to path manipulation using `prefix_path`,
`real_pathcmp`; repository state checks using
`is_bare_repository_cfg`, `is_inside_git_dir`, `is_inside_work_tree`;
accessing `startup_info`, `git_path`, etc.) into `builtin/repo-info.c`
or potentially shared helper functions if appropriate.
* **Specific Option Implementation:**
* `--show-toplevel`, `--show-cdup`, `--show-prefix`: Rely on the
`prefix` calculated during setup and path manipulation functions.
* `--git-dir`, `--git-common-dir`: Access `repo->gitdir`,
`repo->commondir` or use functions like `get_git_dir()`,
`get_common_dir()`. `--resolve-git-dir` will involve path resolution
relative to the provided argument.
* `--is-*` flags: Call existing helper functions like
`is_bare_repository_cfg()`, `is_inside_git_dir()`,
`is_inside_work_tree()`. `--is-shallow-repository` involves checking
`repo->is_shallow`.
* `--shared-index-path`: Access path information related to split
indexes if enabled.
* **Output Generation:** Use standard C `printf("%s\n", ...)` or
`puts(...)` to print the resulting string (path, "true"/"false", etc.)
to standard output, matching `rev-parse`'s current behavior. Boolean
flags typically exit `0` for true and `1` for false without output,
this behavior should be preserved.

2. **Documentation:**
* Create `Documentation/git-repo-info.adoc` using AsciiDoc format,
modeling it after existing man pages. Detail each option, its purpose,
and expected output.
* Modify `Documentation/git-rev-parse.adoc`, adding notes to the
relevant options indicating they are better handled by `git repo-info`
and potentially marking them for deprecation in a future Git version.

3. **Testing:**
* Create `t/tXXXX-repo-info.sh` using `test-lib.sh`.
* Structure tests using `test_expect_success` blocks.
* Utilize helper functions like `test_create_repo`, `cd repo`,
`test_cmp` to compare the output of `git repo-info --option` directly
against `git rev-parse --option` (for options producing output) or
against expected exit codes (for boolean flags).
* Cover edge cases like running outside a repository, in a bare
repository, deep within a worktree, within the `.git` directory, and
in repositories with submodules or worktrees.

4. **(Stretch Goal) JSON Output Implementation:**
* If attempted, add a `--format=json` option using `parse-options`.
* Collect results from the requested options internally.
* Use either an approved embedded C JSON library or Git's `strbuf` API
(with helpers like `strbuf_add_json_string`) to construct a JSON
object mapping option names (or descriptive keys) to their
corresponding values. Print the final JSON string to standard output.
Add specific tests for JSON output validation.

## Detailed Project Timeline

**Phase 0: Pre-Acceptance Preparation (April 9 - May 7, 2025)**

* **Focus:** Demonstrate continued interest and deepen understanding
*specifically of `rev-parse`'s internals* while awaiting results.
* **Activities:**
* **(April 9 - April 21):** Deep dive into `builtin/rev-parse.c`,
identifying the exact code blocks implementing the "OPTIONS FOR
FILES". Trace how they use `startup_info`, `prefix`, path functions,
and repository flags.
* **(April 22 - May 7):** Continue monitoring the mailing list. Refine
understanding of Git's testing framework, specifically focusing on
tests for `rev-parse` options (e.g., `t1006-cat-file.sh`,
`t5601-clone.sh` might use some flags). Review contribution
guidelines.

**Phase 1: Final Planning (May 8 - May 26, 2025 Approx.)**

* **Focus:** Formal introductions, confirm final scope & plan, setup.
* **Activities:**
* **(Week 1: May 8 - May 12):** Introduction with mentor(s). Confirm
the exact list of `rev-parse` options to be migrated. Discuss the
preferred approach for handling deprecation in `rev-parse` docs/code.
Discuss potential for shared helper functions vs. direct code
migration.
* **(Week 2: May 13 - May 19):** Set up dev environment. Deep dive
into the agreed-upon functions/code blocks within `rev-parse.c`.
Outline the basic structure for `builtin/repo-info.c` and the test
script `t/tXXXX-repo-info.sh`.
* **(Week 3: May 20 - May 26):** Implement the basic `cmd_repo_info`
skeleton, option parsing setup, and repository setup boilerplate.
Write initial "no-op" tests. Post first blog update.

**Phase 2: Implementation in Batches (Coding Weeks 1-8: May 27 - July
21, 2025 Approx.)**

* **Focus:** Implement options in logical groups, test thoroughly,
submit patches early and often.
* **GSoC Milestone:** Midterm Evaluations occur around Week 8.
* **Activities:**
* **(Batch 1 / Weeks 1-2: May 27 - June 9):** Implement basic path
queries: `--show-toplevel`, `--show-prefix`, `--show-cdup`. Add tests
comparing output with `rev-parse`. **Submit Patch Series 1**.
* **(Batch 2 / Weeks 3-4: June 10 - June 23):** Implement directory
queries: `--git-dir`, `--git-common-dir`, `--resolve-git-dir <path>`.
Add tests. **Submit Patch Series 2**. Write blog post update.
* **(Batch 3 / Weeks 5-6: June 24 - July 7):** Implement boolean state
queries: `--is-bare-repository`, `--is-inside-git-dir`,
`--is-inside-work-tree`. Add tests checking exit codes and behavior in
various locations. **Submit Patch Series 3**.
* **(Batch 4 / Weeks 7-8: July 8 - July 21):** Implement remaining
queries: `--is-shallow-repository`, `--shared-index-path`,
`--show-superproject-working-tree`. Add comprehensive tests covering
interactions (e.g., in submodules, shallow clones). **Submit Patch
Series 4**. Prepare for Midterm evaluation; ensure submitted batches
demonstrate core progress. Write blog post update.

**Phase 3: Documentation & Final Polish (Coding Weeks 9-12: July 22 -
Aug 18, 2025 Approx.)**

* **Focus:** Create documentation, address feedback on all patches,
refine implementation, potentially attempt stretch goal.
* **Activities:**
* **(Week 9: July 22 - July 28):** Write the first complete draft of
the man page for `git-repo-info`. Draft the necessary updates for
`git-rev-parse.adoc` (deprecation notices). **Submit Patch Series 5
(Documentation)**.
* **(Week 10: July 29 - Aug 4):** Focus on addressing review comments
on **all** previous patch series. Refactor code based on feedback.
Ensure test suite is robust and covers feedback points.
* **(Week 11: Aug 5 - Aug 11):** *Stretch Goal (Conditional):* If core
functionality and docs are stable and reviewed positively, begin
investigating/implementing `--format=json`. Add specific JSON tests if
implemented. Otherwise, focus on further code cleanup and test
hardening.
* **(Week 12: Aug 12 - Aug 18):** Prepare and submit final versions of
all patch series, incorporating all feedback. Final testing pass.
Write blog post update summarizing progress and final state. Code
freeze for final evaluation.

**Phase 4: Final Evaluation & Wrap-up (Aug 19 - Nov 19, 2025)**

* **Focus:** Final submissions, respond to late feedback, ensure
project completion.
* **Official GSoC Milestone:** November 19, 2025 - Program End Date.
* **Activities:**
* **(Late Aug - Sept):** Submit final GSoC evaluations. Actively
respond to any further comments on submitted patches from the
community/maintainers, aiming for merge readiness.
* **(Oct - Nov 19):** Monitor mailing list for patch status. Write
final GSoC project summary blog post. Continue engaging with the
community if interested in further contributions beyond GSoC.



Thank You,
Jayatheerth




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux