[GSoC] [RFC] Project Proposal v2: Refactoring in order to reduce Git's global state

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



From: Anthony Wang <anthonywang513@xxxxxxxxx>

Hi all,

I'm interested in working with Git over the summer to "Refactor in Order 
to Reduce Git's Global State." My main motivation for choosing this idea 
is that improving Git's environment handling will enhance long-term 
maintainability and scalability, and allow for multiple-repository 
interactions. 

This is the second version of my proposal. I would love to 
receive feedback on the content or structure, especially on the points 
pertaining to the actual implementation of the project and planned work. 
I will also discuss my past experience, and speak on why I believe I 
will be able to effectively contribute in a sustainable way to this 
large undertaking.
Thanks!

-------------------------------------

# Refactoring in Order to Reduce Git’s Global State

### Personal Info
Name: Anthony Wang
Timezone: Eastern Time (ET)/UTC -5
GitHub: https://github.com/wang-anthony03
LinkedIn: https://www.linkedin.com/in/anthonywang03/

## About Me

My name is Anthony Wang, and I am a 3rd year Computer Science student 
at the University of Virginia. I have experience in software engineering 
and development, particularly in C, Python, and shell scripting. This is 
my first time working with open source software, and I am incredibly 
excited to contribute to Git, as I have always wanted to work on 
developing the tools that I use everyday.

My background includes building scalable automation tools and 
contributing to infrastructure projects at Verizon. I was able to work 
with large codebases, and I learned the importance of clean, 
well-documented code, as well as the challenges of tech-debt leading to 
difficulties in maintaining code.

## Previous Experience

- Experience using Git extensively in academic and personal projects.
- Experience working with C as the main language for multiple Computer 
Systems courses.
- Developed a text editor in C. [1]

## Microproject
- t8911: avoid using pipes and improve code clarity
Thread: https://public-inbox.org/git/20250405103718.25160-1-anthonywang03@xxxxxxxxxx/
Status: undergoing discussion  
Description: In order to expose more testing outputs, we remove the 
piping of `git tag` outputs in order to expose the exit codes. In 
addition, we remove `-q` tags on instances of `grep` to ensure clarity. 
We also replace `grep` with `test_grep` to provide helpful debug output 
in case of test failure.


## Project Proposal

### Objective
This project aims to modernize Git's environment handling by refactoring 
the environment.c code to reduce the reliance on global state. The goal 
is to move environment variables and configuration settings from the 
global scope into appropriate local contexts, primarily within struct 
repository and struct repository_settings. This architectural 
improvement will:

- Enhance code maintainability by making dependencies explicit.
- Reduce the risk of unintended side effects from global state 
modifications.
- Improve Git's ability to handle multiple repositories within the same 
process.

### Expected Project Size: 90 or 175 Hours

## Key Tasks

1. Identifying Global State Variables
- Analyze environment.c and related files to locate global variables.
- Identify the functions and modules that rely on these global 
variables, and list all files using each variable.
- Categorize these variables based on their use cases and potential 
migration targets.
- Write Documentation listing all desired variable migration, allowing 
for community contribution.
- Discuss prioritization with the community and designate priority of 
tasks.

2. Refactoring Process
- Move identified global variables into struct repository or struct 
repository_settings.
- Ensure proper initialization and access mechanisms to maintain 
current behavior.
- Updating Affected Code Paths
- Modify functions and modules that rely on the old global state.
- Ensure all relevant operations correctly pass repository-specific 
context.

3. Testing and Validation
- Run Git's extensive test suite to verify functionality remains intact.
- Update or create new tests as needed to cover refactored components.

4. Documentation
— Document changes in Git's developer notes.
— Provide clear explanations for new structures and access patterns.

5. Challenges and Considerations
- Ensuring backward compatibility and avoiding regressions.
- Handling dependencies between different parts of the codebase that 
rely on global state.
- Keeping performance overhead minimal while introducing structured 
state management.

## Planned Implementation
I have been examining the codebase to search for global variables 
and have found two major groups:

The first group is just one variable - `the_repository`. In order 
to refactor the code to remove dependencies on `the_repository`, 
we would need to adapt the code in various manners. The first would 
be to start at the lowest level, and modify the code so that it would 
take in the needed parameters from `the_repository`, removing the 
internal dependency on `the_repository`. Alternatively, if the 
dependency can simply be resolved by using information from a 
different source, such as a `struct repository`, or other context 
dependent information, such as in [2] or [3],adapt the code to take 
in the information from the alternate source and remove the 
dependency entirely. Following that, move up another level and repeat 
the process.

The second group is config variables. Because these are global 
variables used by multiple subsystems, refactoring is a bit more 
complicated. For areas with access to the repository, relocate the 
config variable to the according struct that stores that information. 
For areas without guaranteed access, move the variable into local 
context appropriate structure and its constructor, guaranteeing 
access to the variable. This workflow heavily draws from [4].


## Schedule
1. **Now -- May 5th**: Exploration of codebase
- Research and familiarize with environment.c and related code. 
- Identify global variables to refactor.
- Engage with the Git community for feedback.

2. **May 6th -- June 1st**: Community bonding
- Get in touch with the mentors;
- Present to the community a first list of variable migrations;
- Receive feedback from the community and modify project plans;
- Present potential changes to ensure they align with community goals;

3. **June 2nd -- July 7th**: First coding period
- Eliminate dependencies on “the_repository” following priority;
- Identify possible patterns in refactoring and document for future 
contributors;

4. **July 8th -- August 10th**: Second coding period
- Move other global variables into local repository contexts;
- Identify possible patterns in refactoring and document for future 
contributors;

5. **August 11 -- August 25th**: Documentation period
- Finalize and refine changes;
- Document changes, the process, and outline future work regarding 
reducing the global namespace;

## Availability
I will be working over the summer, but regularly code and study outside 
of work/school hours. I will be dedicating all of this extra time 
towards this project, but I acknowledge that I will not be able to spend 
an incredibly large amount of time,  thus I have chosen my expected 
project size to be up to 175 hours if given an extended timeline, and 
90 hours if not. I will be fully available outside of working hours to 
this task, and will set up a compatible schedule with my mentor(s) in
order to ensure that an effective line of communication is established.

I appreciate your time in reviewing this proposal and look forward to 
your feedback!
Thank you, 

Anthony Wang
(He/him/his)

References:
-----------

[1] https://github.com/wang-anthony03/Quill

[2] https://lore.kernel.org/git/20250310-b4-pks-objects-without-the-repository-v4-6-f201b8ec57ba@xxxxxx/

[3] https://lore.kernel.org/git/20250310-b4-pks-objects-without-the-repository-v4-1-f201b8ec57ba@xxxxxx/

[4] https://public-inbox.org/git/342a26572d6372d40ba563d73242e0d18a481d2a.1733236936.git.karthik.188@xxxxxxxxx/







[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux