Re: [GSoC][Proposal RFC v2] Consolidate ref-related functionality into git-refs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Everyone,
This is my proposal for "Consolidate ref-related functionality into
git-refs" in Google Summer of Code 2025. The Doc version:
https://docs.google.com/document/d/1Nfg6Dner1eU10LIlhkSJ5-N31Y6QuWj8bad_bAiMBa8/edit?usp=sharing

I'd appreciate any feedback on this.

Thanks,
Meet

---------8<----------8<----------8<----------8<----------8<----------8<----------8<----------8<
GSoC 2025 @ Git | Meet Soni
Consolidate ref-related functionality into git-refs
---------------------------------------------------


Personal Information:
---------------------
Name: Meet Soni
E-mail: meetsoni3017@xxxxxxxxx
Mobile No.: +91 9054520887

Education: Silver Oak University, Gujarat, India
Year: III/IV
Degree: Bachelors in Computer Engineering

Time-Zone: UTC + 5:30 (IST)
GitHub: github.com/inosmeet
Blog: inosmeet.github.io
LinkedIn: https://www.linkedin.com/in/meet-soni-4230701b9/


Pre-GSoC:
---------

I got into Git’s codebase in November 2024 by reviewing various documentations,
previous patches and began contributing in mid-December 2024.

Following is the list of contributions that I have made:

* [PATCH v3] t7611: replace test -f with test_path_is* helpers
  Status: Merged into master
  Merge commit:cef3d4a89f8d21fae6669822cbb540927020d93b
  Description:
    This patch is my first contribution to fulfill microproject criteria. It
    improves test t7611-merge-abort.sh by converting old style path
checks to use
    modern helper functions in order to achieve better debuggability.
  Mailing list thread:
    https://lore.kernel.org/git/20241227105345.10184-1-meetsoni3017@xxxxxxxxx/

* [GSoC][PATCH v4 0/5] refspec: centralize refspec-related logic
  Status: Merged into master
  Merge Commit: e4f6ab008522c5ad386485720770b8d03b4fb880
  Description:
    This patch series addresses a design inconsistency noted by Patrick, where
    the refspec-related logic was scattered across multiple headers, by renaming
    and relocating this logic for improved cohesion. In particular, functions
    such as omit_name_by_refspec() have been renamed to better reflect their
    intended functionality, and the core refspec-related routines have been
    moved from remote.c into a dedicated refspec.c file, ensuring a clear
    separation of concerns and more maintainable code.
  Mailing list thread:
    https://lore.kernel.org/git/20250204040558.34766-1-meetsoni3017@xxxxxxxxx/

* [GSoC][PATCH v2] remote: relocate valid_remote_name
  Status: Merged into master
  Merge Commit: f21ea69d945f958704f2fe143c2638ecae6e0d12
  Description:
    This patch, prompted by Junio's feedback in my previous patch series,
    moves the valid_remote_name() function from refspec to remote to
    centralize functionality related to remote repositories, thereby
    maintaining a clearer separation of concerns.
  Mailing list thread:
    https://lore.kernel.org/git/20250204142852.13035-1-meetsoni3017@xxxxxxxxx

* [GSoC][PATCH v2] merge-recursive: optimize time complexity for process_renames
  Status: Merged into master
  Merge Commit: b07dd9078b8ba5f3b7f5c88f84f7ee9c34fa65e1
  Description:
    This patch reduces time complexity for process_renames() from O(n^2) to
    O(n log n) when building a sorted string_list by constructing it unsorted
    and sorting it afterward, thereby addressing a previously noted TODO
    comment.
  Mailing list thread:
    https://lore.kernel.org/git/20250214044129.15282-1-meetsoni3017@xxxxxxxxx/

* [GSoC][PATCH v2] refspec: clarify function naming and documentation
  Status: Merged into master
  Merge Commit: 044b6f04f23d6c7e3c3750c9829db96b71470874
  Description:
    This patch renames a function and its parameters to improve clarity and
    consistency in refspec matching, addressing earlier feedback from Junio to
    resolve documentation ambiguities and enhance overall code readability.
  Mailing list thread:
    https://lore.kernel.org/git/20250215084539.73799-1-meetsoni3017@xxxxxxxxx

* [GSoC PATCH v5 0/3] reftable: return proper error codes from block_writer_add
  Status: Merged into next
  Merge Commit: 27571684ddca217d65c5f39947f20b9f5ec91863 (next)
  Description:
    This patch series refines error handling by eliminating assumptions about
    the error codes returned by block_writer_add(), a change motivated by a
    TODO comment to ensure more robust and flexible behavior.
  Mailing list thread:
    https://lore.kernel.org/git/20250319152927.1263033-1-meetsoni3017@xxxxxxxxx

* [GSoC][RFC PATCH] show-branch: use commit-slab for flag storage
  Status: On-hold from my side
  Description:
    This patch attempts to replace direct access to commit->object.flags with
    a commit-slab mechanism by introducing get_commit_flags() and
    set_commit_flags() for flag management and ensuring the canonical
    UNINTERESTING definition is used. Initially prompted by a TODO comment,
    review feedback broadened the scope to address several additional aspects
    that were new to me; while further investigating these changes, I was
    selected for LFX mentorship, which resulted in increased time constraints
    and so I had to temporarily deprioritise further refinements -- though I
    plan to continue advancing this work in the near future.
  Mailing list thread:
    https://lore.kernel.org/git/20250217055049.9217-1-meetsoni3017@xxxxxxxxx/

* [Practice PATCH] refs: add list subcommand
  Status: As PR on my fork
  Description:
    As it would’ve been inappropriate to send this patch to the mailing list
    without getting selected in GSoC, I decided to push it to my fork in order
    to showcase my abilities and get a better understanding of the project
    scope by this proof-of-concept type of practice patch. This will be used
    as a reference ahead.
  PR link:
    https://github.com/inosmeet/git/pull/1

* Reviewed fellow contributors’ patches:
  Mailing list threads:
  * https://lore.kernel.org/git/CAPhwyn0tGHuX_Gh=rno9wj8fLb6zG4M3QAZyQDQ8qZyE+Uyg_Q@xxxxxxxxxxxxxx/
  * https://lore.kernel.org/git/CAPhwyn2qeN_tZOEyhD6=TLEdQbcCEV1thxpDwNzApqaET0+5og@xxxxxxxxxxxxxx/
  * https://lore.kernel.org/git/CAPhwyn03LbYexkk4YsaC6F2H_m6o73fU6aQ-c0urfdAsyEqPMg@xxxxxxxxxxxxxx/
  * https://lore.kernel.org/git/CAPhwyn0Sq0hDktPtf53Qs6LKwNsmn6yXuVyEfcYzyXK4yjd7HA@xxxxxxxxxxxxxx/


The Project:
------------
Currently, Git’s reference management is distributed among several commands,
including git update-ref, git show-ref, git for-each-ref, git pack-refs, git
symbolic-ref and git check-ref-format. The functionality of these commands is
implemented using the functions from refs.[c|h]

This project aims to streamline reference management in Git by consolidating
functionality currently spread across multiple commands into a single git-refs
command. The project-idea page specifically mentions update-ref, show-ref,
for-each-ref, and pack-refs; therefore, this project will initially focus on
these four commands, with the remaining ones slated as stretch goals, more on
that below.

The updated git refs command will offer subcommands to list, retrieve,
verify/check existence, write, and optimise references.

Since the project involves developing new subcommands, selecting appropriate
names and addressing design considerations will be one of its primary
challenges.


The Plan:
---------
To tackle this project, the consolidation of each command will be divided into
the following steps:

    1. Create subcommands:
      Develop the actual subcommand under the refs command(builtin/refs file)
      by leveraging core functions from the refs module. The implementation
      will mimic the logic of the legacy commands where it makes sense for
      consistency with existing behavior, but we'll also evaluate opportunities
      for improvement where deviating from the legacy behavior could be
      beneficial.
    2. Tests:
      Develop comprehensive tests to verify that the new subcommands function as
      expected. This will involve creating a range of tests, including shell
      scripts and/or unit tests located in the t/ directory. For the existing
      tests covering the legacy commands targeted for consolidation, the plan is
      to retain them initially—even if this results in some duplication—with the
      intention of deprecating them gradually over time.
    3. Documentation:
      Develop comprehensive documentation for the newly implemented
subcommands to
      ensure clear guidance for users and maintainers.


Command mapping and naming:
---------------------------

I have taken reference for potential names for these subcommands from Patrick’s
suggestion (https://gitlab.com/gitlab-org/git/-/issues/330):

  # Replaces git-show-ref(1) and git-for-each-ref(1).
  $ git refs list

  # Replaces `git show-ref --exists`.
  $ git refs exist

  # Replaces `git show-ref --verify <ref>`.
  $ git refs show

  # Replaces git-symbolic-ref(1) to show a ref.
  $ git refs resolve

  # Replaces git-pack-ref(1).
  $ git refs optimize

  # Replaces git-update-ref(1).
  $ git refs write

  # Replaces git-check-ref-format(1).
  $ git refs check-name

Below is a list of the commands along with their associated subcommands/flags
that need to be considered for consolidation under this project:

git show-ref -> git refs list
Used to list references in a local repository.
* abbrev
* branches
* tags
* exists
* verify
* exclude-existing
* dereference
* head
* hash
* quiet


git for-each-ref -> git refs list
Used to output information on each ref, quite similar to show-ref.
* --count
* --shell|--perl|--python|--tcl
* --sort
* --format
* --include-root-refs
* --points-at
* --merged|--no-merged
* --contains|--no-contains
* --exclude
* --ignore-case
* --omit-empty


git pack-refs -> git refs optimize
Used to pack heads and tags for efficient repository access.
* --all
* --no-prune
* --auto
* --include
* --exclude


git update-ref -> git refs update/write
Used to update the object name stored in a ref safely.
Subcommands:
* [symref-]update
* [symref-]create
* [symref-]delete
* [symref-]verify
* option
* start
* prepare
* commit
* Abort
Options/Flags:
* --stdin
* -m
* -d
* --no-deref
* --create-reflog


Options/Config sharing:
-----------------------
For sharing configuration options, I propose a structure-based approach that
centralizes common options while grouping subcommand-specific settings into
dedicated substructures. This design embeds common options directly in the
top-level configuration structure (struct refs_options), and then uses separate
structures (such as struct list_options) to encapsulate options unique to each
subcommand.

Below is a representative code snippet illustrating this approach:

struct refs_common_options {
        ...
        // ... Common Options ...
        ...
};

struct list_options {
        struct refs_common_options common;
        // ... options specific to 'list' subcommand ...
};


struct exist_options {
        struct refs_common_options common;
        // ... options specific to 'exist' subcommand ...
};

This design is tentative, and we will evaluate the best approach on a
case-by-case basis during implementation to ensure flexibility and efficiency.

Since we plan to maintain the legacy commands for the foreseeable future, we
must ensure backward compatibility while developing these new subcommands. Going
through this I wonder if we should make the commands as a standalone
entity, like
a library. However, it needs to be thoroughly discussed.

While studying these commands, I thought why not try out one of them to better
understand the project’s requirements, scope and get a firmer grip on the
codebase. So, I developed a reference implementation of git refs list that
mimics the behaviour of git show-ref with [--head], [--branches|--heads] and
[--tag] flags, which can be found (https://github.com/inosmeet/git/pull/1).

Although this reference implementation is still a work in progress and not yet
ready for merging into master, it demonstrates my understanding of navigating
the Git codebase and the process of creating commands and subcommands. More
importantly, it indicates my ability to execute this project effectively.


Timeline:
---------

Pre-GSoC (Until May 8):
* Continue to work on different things like the pending WIP patch that I have.
  Be engaged/involved in the community.

Community Bonding (May 8 - June 1):
* Talk with mentors and discuss potential names for the new subcommands and
  interface design. Start the consolidation early with mentors’ permission.

Phase I (June 2 - 29 June):
* Consolidate shared functionality common to both show-ref and for-each-ref
  commands along with their  subcommands/flags to refs... command.

Phase II (30 June - 13 July):
* Consolidate non-shared functionalities of show-ref and for-each-ref -- such
  as --exists, --verify and related options -- into refs... command.

Phase III (14 July - 27 July):
* Consolidate pack-refs command along with subcommands/flags to refs...
  command.

Phase IV (28 July - 24 August):
* Consolidate update-ref command along with subcommands/flags to refs...
  command.

Final Week (25 August - 01 September):
* Some final touches. Make a final report about the work accomplished and
  outline future work.

I have not allocated separate timeline slots for tests and documentation, as
these will be integrated into the patches that introduce the new subcommands.

I think if permitted to start early, I can consolidate one more command within
the GSoC period.


Related previous work:
----------------------

git-switch Command:
https://github.com/git/git/commit/d787d311
Mailing list thread:
https://lore.kernel.org/git/20190130094831.10420-1-pclouds@xxxxxxxxx/


git-restore Command:
https://github.com/git/git/commit/46e91b663badd99b3807ab34decfd32f3cbf15e7
Mailing list thread:
https://lore.kernel.org/git/20190308101655.9767-2-pclouds@xxxxxxxxx/


Stretch goals:
--------------

If the consolidation of the 4 commands is completed ahead of schedule, I would
like to consolidate the remaining commands (these can be done even after the
GSoC period). These commands include git symbolic-ref and git check-ref-format.

List of commands along with their subcommands/flags that needs to be considered
for consolidation:

git symbolic-ref -> git refs resolve
Used for reading or updating symbolic references.
* -m
* --delete
* --quiet
* --short
* --recurse | --no-recurse


git check-ref-format -> git refs check-name
Used for validating whether a given reference adheres to Git’s naming
conventions
for references.
* --[no-]allow-onelevel
* --normalize
* --refspec-pattern
* --branch


Blogging:
---------

Blogging is a good way to foster transparency and community engagement in
open-source projects. By sharing insights, challenges, and milestones, blog
posts not only document the project’s progress but also create opportunities for
feedback and collaboration from the broader community.

In my project, I plan to commit to bi-weekly blog posts. These updates will
detail my progress, share lessons learned, and highlight any obstacles
encountered along the way, ensuring that the development process remains open
and interactive.

The blog will be at: https://inosmeet.github.io


Post-GSoC:
----------

Over the past three months, the Git community has been an invaluable source of
support and learning, significantly contributing to my growth as a developer.
Even after the summer program, I intend to stay active in the mailing list and
continue contributing meaningful patches. I'm thinking of consolidating
remaining commands after GSoC is over. I've also been following the early
efforts to integrate Rust components into Git via libgit-sys, and I would be
excited to contribute to that initiative as it matures.

Lastly, if given the opportunity, I would be delighted to mentor new
contributors and help the community grow even further.


Some credits to myself:
-----------------------

I am an open source enthusiast, I have contributed to some other open-source
projects as well. I was selected as a GSoC 2024 mentee for the Python Software
Foundation’s cve-bin-tool sub-organization where I worked on improving Product
mapping using PURLs (Package URLs).

More recently, I got selected as a LFX 2025 term-1 mentee for Cloud Native
Computing Foundation’s Microcks sub-organization. Here I’m working on improving
Microcks’ delivery and validation using GitHub Actions CI deployment tests.

These opportunities have given me valuable initial exposure to open source
practices and community collaboration, and I’m excited to further build my
skills while contributing fresh ideas to this project.

Blogs for both my GSoC as well as LFX journey can be found at my blog site:
https://inosmeet.github.io/posts


Availability:
-------------

My current semester ends on 24th April and my exam tentatively on 15th May,
leaving me enough time to prepare for my GSoC project. If I am selected, I shall
be able to work five days per week, 7 - 8 hours per day, dedicating around 35-40
hrs a week on the project.

The difficulty level of this project is medium, and the expected project size is
estimated to require about 350 hours of work. Based on my proposed commitment of
35-40 hours per week, this project aligns with my availability and intended
workload for the GSoC period, ensuring that I have sufficient time to
accommodate any unforeseen circumstances that may arise during the project.

---
Thank you for considering my application. I am excited about the possibility of
contributing to this project and learning from the mentorship experience.

Thanks & Regards,
Meet





[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux