Re: [PATCH 0/4] update MyFirstContribution with current code base

JAYATHEERTH K <jayatheerthkulkarni2005@xxxxxxxxx> · Fri, 16 May 2025 22:48:30 +0530

On Fri, May 16, 2025 at 9:41 PM Junio C Hamano <gitster@xxxxxxxxx> wrote:
>
> JAYATHEERTH K <jayatheerthkulkarni2005@xxxxxxxxx> writes:
>
> > On Fri, May 16, 2025 at 4:09 AM Emily Shaffer <nasamuffin@xxxxxxxxxx> wrote:
> >
> >> Mostly I lurk these days :) I do still keep an eye on the list. Will
> >> happily take a look at your series tomorrow, I'm out of time for
> >> today. But per what I mention below, if you don't hear from me, please
> >> don't feel blocked by the review, as I think the MyFirstContribution
> >> doc is comfortably maintained by the whole project by now.
> >>
> >
> > Understood!! thanks for letting me know
> >
> >> > So for now I will cc Philippe
> >>
> >> For what it's worth, I don't think it is harmful to CC people even if
> >> they will be inactive. CCing someone is not necessarily the same thing
> >> as saying that person needs to approve your code change, right? So I
> >> don't see the harm in CCing with low expectations - in fact, in my
> >> case it would help make the email stand out, so you'd be more likely
> >> to get a review from me (I missed this thread going by initially).
> >>
> >>
> >
> > Oh, ok I will keep that in mind next time.
> >
> >>  - Emily
> >
> > Thank you,
>
> Thanks for a pleasant conversation; others can also learn from this
> exchange, hopefully.  In Documentation/SubmittingPatches we have
> "Choosing your reviewers" section lacks anything more concrete than
> "who are involved in the area you are touching", and those who use
> common sense may say, just like you did, "ah, most of the text I am
> replacing was written N years ago by person X, whom I no longer see
> on the list very often" and decide to omit it.  Perhaps we would
> want to enhance the text there somewhat?  I dunno.
>

Agreed even a single practical example in the "Choosing your
reviewers" section of SubmittingPatches could guide contributors
better.
I'd be happy to draft a patch that adds such a line, based on this
thread’s discussion.

> Since there were discussions on contrib/contacts recently (a few of
> the participants there added to CC), I tried it and unfortunately I
> was not very impressed by its output [*].
>
> After applying the four patches on top of 'master', you'd run the
> tool like so:
>
>     $ contrib/contacts/git-contacts master..
>     Jonathan Nieder <jrnieder@xxxxxxxxx>
>     Jacob Stopak <jacob@xxxxxxxxxxxxxxxx>
>     Jeff King <peff@xxxxxxxx>
>     Jean-Noël Avila <jn.avila@xxxxxxx>
>     Emily Shaffer <nasamuffin@xxxxxxxxxx>
>     Atharva Raykar <raykar.ath@xxxxxxxxx>
>     Junio C Hamano <gitster@xxxxxxxxx>
>     Todd Zullinger <tmz@xxxxxxxxx>
>     Kyle Lippincott <spectral@xxxxxxxxxx>
>
> The tool gave output in a different order every time it was run.  It
> wasn't obvious what the ordering meant.
>
> By looking at its source, I can tell that the names and addresses
> are collected from trailers like reported-by, which are counted with
> the same importance as the authorship, that the reason why the
> output is different each time it is run is due to use of keys %hash
> in a Perl script, etc., but counting sign-off would mean that I'd be
> summoned for each and every change related in this project, which
> would not be very productive use of everybody's time.
>

Agreed, but I don't know if there are any projects where there are no
authorship names
and direct commit details.
Or maybe there are two commits where it must create more confusion.

> And it of course is not clear who are still active in the recent
> past and why the name was in the list (it would not be as productive
> to ask for a review from somebody who was listed for reporting many
> problems in the area affected by the proposed patch than those who
> wrote the original) from this output.  There may want an "explain"
> mode that lets you feed a patch and get observations like:
>
>     The majority of lines you are touching haven't changed much
>     since person X wrote commit W 5 years ago, and the text turned
>     into current shape with contributions by person Y and Z.  Here
>     are the URLs into the lore archive for the discussion that you
>     can see how X, Y, and Z participated in the original before you
>     touched.  You may also want to look at commit V and U as well.
>
>     Last time we saw person X, Y, and Z on the list were ..., here
>     are the URLs into the lore archive.
>
> Perhaps some AI minded folks can write such a service for us ;-)?
>

If we're talking about AI approaches, I do think this could be
feasible with LLMs. I imagine a pipeline where:
A patch is parsed and matched to the line-level history (via git blame
or log -L)

The commit history is summarized to extract contributor roles
Activity is cross-checked on lore.kernel.org
An LLM generates human-readable explanations with references and
confidence indicators

Of course, the risk of hallucinations is real but with a properly
curated context (e.g., logs and emails as input, strict templates), I
think we can keep it grounded.
I'd like to prototype such a tool and would value the list's feedback
on this idea.

Also I think this idea would only make sense as a seperated solution
and not adding in Git
because it would cost a lot of compute to run LLMs locally, or perhaps like
email the way we add config on the before hand, (if we are combining
with git) giving people an option to add an API to their LLM would
work
But this is just a vague idea.

Thanks again and truly find this thread constructive.

>
> [Footnote]
>
>  * I didn't try other alternatives which I didn't have, and the
>    other thread there was a mention of "git related" with "seems
>    like rather more work".
>
>    cf. https://lore.kernel.org/git/aBr9bwNQ1J46NNXI@xxxxxx/

-Jayatheerth