On Mon, Jul 28, 2025 at 11:40 AM Kees Cook <kees@xxxxxxxxxx> wrote: > > On Sun, Jul 27, 2025 at 03:45:42PM +0000, Dr. David Alan Gilbert wrote: > > When doing qemu dev, I frequently run it in a tmux, and start it with > > '-nographic' which gets you a single stream with both serial and monitor in it; > > alternatively you can get one pane with the serial output and one with the > > monitor, that takes a little more setup; > > Yeah, I haven't played with it yet, but I expect I'll need to try several > approaches and see which the agent can best deal with. It's better with > non-interactive stuff, so I'm thinking that giving it tooling that will > run a script at boot or have the image bring up ssh for the agent to run > individual commands via ssh... it all depends on what the agent can wrap > its logic around. FWIW, If we ask LLM to produce the code, then LLM replies with some description and the code section within the paragraph. So in this pipeline, we need to pre-process the LLM output. But there's another way, I believe. We explain the MCP agent its role with the instruction, tell it to save the code output to the designated directory. This should be possible using MCP filesystem servers with RW access of the directory, so we're ready to test the generated git diffs or C code. Testing can be also orchestrated by the separate MCP agent who is instructed to take the code from the output directory and run the QEMU on specific arch, config etc. Code generator and testing agents can optimize by themselves. There's a MCP agent framework with "Evaluator-Optimizer" workflow [1] to optimize the output to some EXCELLENT quality, which is a vague description for me. [1] https://github.com/lastmile-ai/mcp-agent/blob/main/examples/workflows/workflow_evaluator_optimizer/main.py#L57 The downside is that all of this works via LLM APIs which are not free. But this is some orchestrated way of verifying LLM code generation, I guess. In local development, we could grep the LLM's git diff and run the QEMU via script for the test, and evaluate the correctness of the code ourselves. The only money charging here will be the LLM model, if it's from the vendor. If Linux kernel has its own trained Ollama-like free models to download, then it's even better. > > -- > Kees Cook