On 7/25/25 22:53, Sasha Levin wrote: > This patch series adds unified configuration and documentation for AI > coding assistants working with the Linux kernel codebase. As AI tools > become increasingly common in software development, it's important to > establish clear guidelines for their use in kernel development. > Hello, Thanks for bringing up this topic. I want to share my experience as I've recently made a primitive experiment working on various Linux kernel development tasks along with LLM help. Slides are available at my Telegram channel [1], which I presented at the local 2600 conference in June, '25. [1] https://t.me/novitoll_ch/366 My tasks were: - Analyze syzkaller crash report (w/ and w/o human patch exists) - De-obfuscate syzkaller C reproducer - Generate syscall descriptions for syzkaller - PATCH review (feeding entire PATCH series) - Generate mm/kasan code snippets - Generate git commit message for the given git diff, and also generate the cover letter The "results" of experiment gave me following understandings of the current 2025 state of LLMs: - Code generation LLMs (Claude 4 Sonnet, OpenAI o3, Gemini 2.5 Pro) are trained to give code samples for not-latest git tree. You can configure your own MCP client, e.g. Claude Desktop, Trae.ai etc. with external MCP servers. I was advised to try context7 MCP server [2] which should hint LLM to give code xrefs, documentations within "the latest" kernel version. I've just checked that the latest version on context7 MCP was 2 weeks ago, which means that at the current moment the pace of PATCH submissions to various linux-* trees exceeds the synchorization of context7 MCP server so LLM can give you "the actual code". So without human/scripted assistance `git apply PATCH` is error prone. [2] https://context7.com/torvalds/linux - Generated code limits in the LLM context window During my experiment with mm/kasan code generation, the context window of LLM was the problem. OpenAI o3 does not limit you with the long dialogue, however I guess, it might forget the code you mentioned in the beginning. Claude 4 Sonnet (chat bot version, not API) has a limited context window - you send it the PATCH series, it replies back, you correct the LLM with such and such, LLM apologies, corrects back etc. This dialogue might end up with many chat interactions, that eventually Claude 4 Sonnet tell you "Start a new chat, the context window exceeded the limit". This was on my $20/month subscription. I know that there's a memory bank for Cursor [3], but my point is that during PATCH series and human interaction as the verifier, I've faced the issue with it. [3] https://github.com/vanzan01/cursor-memory-bank - Code verification LLM does not do any kind of verification of proposed code. So the human still needs to compile, run, test it. - LLM API tokens cost for MCP agents At the end of my experiment, I've tried to have a MCP agent, where I've explained to the agent its role. I've set the temperature coef (LLM creativity) to 0.7: - 0.0 = most deterministic - 0.3 - 0.5 = summary, tech writing - 0.7 = balanced, default in most LLMs - 1.0+ = more creative, randomness Max tokens per 1 chat = 10240, max iterations = 10 (MCP agent can produce up to 10 chats). Result is 100K tokens. Which in OpenAI o3 was less than $1 per my billing. == Summary While LLMs are **really** good in composing git messages, cover letters (in fact, LLM was initially designed to predict the next word, AFAIU), and composing you error-free grammatically, with all Oxford English dictionary, yet my current $20/month in Claude, $20/month in OpenAI subscriptions and $20 for the API usage, does not allow me to "vibe coding" with the Linux kernel. > The series consists of two patches: > > 1. The first patch adds unified configuration files for various AI > coding assistants (Claude, GitHub Copilot, Cursor, Codeium, Continue, > Windsurf, and Aider). These are all symlinked to a central documentation > file to ensure consistency across tools. > I don't think that we want to have a separate instruction per vendor. Like you missed trae.ai, for example. So perhaps, _if_ the Linux kernel has the documentation for LLM setup, then I'll be happy to see following: - MCP server setup. MCP JSON is applicable for all aforementioned vendors, so we won't have to support vendor specific instructions. A custom MCP server with tools like "@code_review", "@code_gen" etc. that does everything we accept in the Linux kernel process. - MCP agent(s) role instructions. Since LLMs are trained publicly, e.g. they are not specifically trained to understand the Linux kernel code, we need to always explain LLM jobs. What we expect from it, what it should and should not do. So some instruction templates (with CAPITAL letters, I've heard LLM understands it better in such a form) will be canonical. Here was my LLM instruction [4]: https://github.com/novitoll/mcp-linux-kernel/blob/d763e85f5ba8ab21ea73ecdeaea35a6ffab6b4f2/demo/finder_agent.py#L17-L29 _AI_ agent per sub-system and driver can be even better. Like if PATCH is within linux-mm subsystem, then the appropriate linux-mm MCP agent should be involved as it has the role to assess the code from different perspectives: w/, w/o KASAN compilation etc. P.S.: Personally, I've decided to pause on the vibe coding, since I spent too much time on explaining to LLM the context and copy-pasting errors, and reading the notorious answer from LLM **You're absolutely right! Let me change my code ...**. > 2. The second patch adds the actual rules and documentation that guide > AI assistants on Linux kernel development practices, including: > - Following kernel coding standards > - Respecting the development process > - Properly attributing AI-generated contributions > - Understanding licensing requirements > > The examples below demonstrate how these guidelines work in practice, showing > proper AI attribution in commits and the assistant's understanding of kernel > documentation requirements. > > All AI assistants are required to identify themselves in commits using > Co-developed-by tags, ensuring full transparency about AI involvement in > code development. > > "Hello world" test: > > $ claude -p "do you need to attribute your commits?" > Yes, according to the CLAUDE.md file in this Linux kernel repository, I must identify myself as an AI assistant in commits by including: > > ``` > Co-developed-by: Claude claude-opus-4-20250514 > ``` > > I should NOT add a Signed-off-by tag - only you (the human developer) should add that, as it represents a legal certification. > > > Example patch creation with Claude Code: > > $ claude -p "Fix the dont -> don't typo in @Documentation/power/opp.rst. Commit the result" > Done! The typo has been fixed and committed. > > > And the resulting patch (no manual edits): > > From 7613561f430920803d945b3cd53aa81440d4f6f0 Mon Sep 17 00:00:00 2001 > From: Sasha Levin <sashal@xxxxxxxxxx> > Date: Fri, 25 Jul 2025 13:38:43 -0400 > Subject: [PATCH] Documentation: power: opp: Fix typo dont -> don't > > Fix a typo in the OPP documentation where "dont" was missing an apostrophe. > > Co-developed-by: Claude claude-opus-4-20250514 > --- > Documentation/power/opp.rst | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/Documentation/power/opp.rst b/Documentation/power/opp.rst > index 1b7f1d854f14a..3eae4b955a63b 100644 > --- a/Documentation/power/opp.rst > +++ b/Documentation/power/opp.rst > @@ -149,7 +149,7 @@ dev_pm_opp_find_freq_exact > > opp = dev_pm_opp_find_freq_exact(dev, 1000000000, false); > dev_pm_opp_put(opp); > - /* dont operate on the pointer.. just do a sanity check.. */ > + /* don't operate on the pointer.. just do a sanity check.. */ > if (IS_ERR(opp)) { > pr_err("frequency not disabled!\n"); > /* trigger appropriate actions.. */ > > > Sasha Levin (2): > AI: Add unified AI coding assistant configuration > AI: Add initial set of rules and docs > > .aider.conf.yml | 1 + > .codeium/instructions.md | 1 + > .continue/context.md | 1 + > .cursorrules | 1 + > .github/copilot-instructions.md | 1 + > .windsurfrules | 1 + > CLAUDE.md | 1 + > Documentation/AI/main.md | 71 +++++++++++++++++++++++++++++++++ > 8 files changed, 78 insertions(+) > create mode 120000 .aider.conf.yml > create mode 120000 .codeium/instructions.md > create mode 120000 .continue/context.md > create mode 120000 .cursorrules > create mode 120000 .github/copilot-instructions.md > create mode 120000 .windsurfrules > create mode 120000 CLAUDE.md > create mode 100644 Documentation/AI/main.md >