Agents That Build Themselves: The Rise of Self-Improving AI

In 2023, an AI agent dropped into a Minecraft world with no instructions and taught itself to mine diamonds. It wrote its own code, tested it against the game environment, and when something worked, it saved the program to a growing skill library for later reuse. Within hours, it had accumulated dozens of reusable abilities — each one built on top of the last.

That agent was Voyager, and it offered a glimpse of something researchers had theorized about for decades: AI systems that improve themselves. Three years later, we're no longer talking about Minecraft. Self-improving agents are rewriting their own architectures, discovering tools nobody programmed them to use, and — according to a recent survey — keeping 80% of the world's top AI researchers up at night.

Here's what's happening, how the technology works, and why it matters for anyone building with AI in 2026.

The Core Idea: Agents That Learn From Their Own Output

Traditional AI agents follow a fixed loop. You define their tools, write their prompts, and deploy them. If they encounter a problem outside their playbook, they fail. Self-improving agents break this pattern by treating their own behavior as something to optimize.

The concept isn't new — Jürgen Schmidhuber proposed the Gödel Machine decades ago, a theoretical system that could rewrite any part of its own code whenever it could mathematically prove the change would help. The catch was that proof requirement. In practice, no system could generate those proofs efficiently enough to be useful.

What changed is that large language models gave agents a powerful shortcut: instead of formal proofs, they can use LLMs to reason about whether a modification is likely to improve performance, test it empirically, and keep what works. This is messier than mathematical certainty, but it actually runs.

How Self-Improvement Works in Practice

The landscape of self-improving agents has exploded over the past two years. The approaches vary, but they share a common architecture: a feedback loop where the agent evaluates its own performance and modifies its behavior accordingly. Here are the key systems pushing the field forward.

Skill Libraries: The Voyager Approach

Voyager (2023) introduced a deceptively simple pattern. The agent prompts an LLM to write code for a task, executes it, checks whether it worked, and if so, stores the program in a skill library — a growing collection of verified, reusable capabilities. The next time it faces a similar challenge, it retrieves the relevant skill instead of starting from scratch.

This "write, test, store, reuse" loop means the agent gets measurably better over time. Voyager explored 2.3x more of the Minecraft map and unlocked 3.3x more items than agents without a skill library. The pattern has since been adopted well beyond gaming — modern agent frameworks like CrewAI and LangGraph now support similar memory-backed skill accumulation for production workloads.

Recursive Code Rewriting: Gödel Agent and DGM

The Gödel Agent (2024) took the concept further. Rather than just accumulating new skills, it modifies its own logic — including the code responsible for deciding what to modify. It uses an LLM to analyze its performance, propose changes to its reasoning pipeline, and evaluate whether those changes improved outcomes. The "self-referential" part is key: the modification engine can modify itself.

Sakana AI's Darwin Gödel Machine (DGM) builds on this with an evolutionary twist. Instead of relying on a single agent's judgment about what to change, DGM maintains a population of agent variants. Each generation introduces mutations — changes to the agent's architecture, prompts, or tool use — and the best-performing variants survive. The result is that DGM discovers general agent design improvements rather than model-specific tricks, and it does so without human guidance on what to optimize.

Both systems face a real limitation: error accumulation. Each self-modification introduces the possibility of subtle bugs that compound over subsequent rounds. The Gödel Agent paper acknowledges this directly — the system "is not sufficiently stable and may be prone to error accumulation, hindering its ability to continue self-optimization." Solving this is one of the field's open problems.

Learning to Improve Across Turns: RISE

RISE (Recursive Introspection), published at NeurIPS 2024, takes a different angle. Instead of rewriting code, it trains models to improve their own outputs over multiple attempts at the same problem. Using reinforcement learning with on-policy rollouts and a reward function, RISE teaches an LLM to look at its previous answer, identify what went wrong, and generate a better response on the next turn.

This is closer to how humans actually improve — not by rewriting our brains, but by reflecting on mistakes and trying again. The practical implication is significant: a RISE-trained model gets better the more turns you give it, making it well-suited for complex tasks where the first answer is rarely perfect.

Evolutionary Algorithm Design: AlphaEvolve

In May 2025, Google DeepMind unveiled AlphaEvolve, an evolutionary coding agent that uses Gemini to design and optimize algorithms. Rather than improving itself as an agent, AlphaEvolve improves the algorithms it produces — generating candidate solutions, evaluating them, and evolving the best performers across generations.

AlphaEvolve represents a subtler form of self-improvement: the agent's own code stays fixed, but it produces increasingly sophisticated outputs through an iterative evolutionary process. This approach sidesteps the error accumulation problem of recursive self-modification while still achieving compounding gains.

The Tool Discovery Frontier

One of the most exciting — and unsettling — developments is agents that discover and create their own tools. Projects like MCP-Zero and AutoTIR (presented at ICLR 2025) use reinforcement learning to train agents that don't just use pre-defined tools but actively search for, evaluate, and integrate new ones.

This matters because tool availability is one of the biggest constraints on agent capability. An agent with access to a database connector, a web scraper, and a code executor can solve fundamentally different problems than one limited to text generation. When agents can find their own tools, their capability space expands without human intervention.

Anthropic's Model Context Protocol (MCP) has accelerated this trend by standardizing how agents connect to external tools. With 97 million monthly SDK downloads and 10,000+ active servers as of early 2026, MCP has created a vast ecosystem of tools that self-improving agents can potentially discover and leverage. Google's complementary Agent2Agent (A2A) protocol extends this to agent-to-agent communication, meaning agents can now not only find tools but find other agents with specialized capabilities.

The trajectory is clear: by late 2026, researchers predict agents will write their own tools from scratch — with no human involvement whatsoever.

What Keeps Researchers Up at Night

This brings us to the uncomfortable part. In a recent survey of 25 leading AI researchers from Google DeepMind, OpenAI, Anthropic, Meta, UC Berkeley, Princeton, and Stanford, 20 of 25 identified automating AI research as one of the most severe and urgent AI risks.

The concern isn't about today's systems. It's about the feedback loop. If an AI agent can improve AI agents, and those improved agents can improve AI agents even further, you get a recursive cycle where capability gains compound faster than humans can evaluate them. This is the classic recursive self-improvement scenario that safety researchers have worried about for years — except now the building blocks actually exist.

ICML 2026 is hosting a dedicated workshop on "AI with Recursive Self Improvement," acknowledging that "loops that rewrite prompts, weights, or hypotheses already operate inside foundation-model pipelines, yet their behavior remains poorly characterized." The organizers note bluntly that "evaluation, safety, and governance tools lag behind algorithmic progress."

Current safety measures are modest. Sakana AI runs all DGM self-modifications in sandboxed environments with human oversight and restricted web access. The Gödel Agent paper proposes formal verification of modifications (circling back to Schmidhuber's original idea). But these are research-lab safeguards, not production-grade governance frameworks.

The gap between capability and control is real. As Deloitte's 2026 agentic AI report puts it: "Organizations are deploying agents faster than they can secure them."

What This Means If You're Building With Agents

You don't need to be building a Gödel Machine to benefit from these ideas. The self-improvement patterns showing up in research are filtering into practical agent development in accessible ways.

Skill accumulation is production-ready. The Voyager pattern — write, test, store, reuse — is now baked into frameworks like LangGraph and CrewAI. If your agent does the same type of task repeatedly, it should be building a skill library. In the Claude ecosystem, SKILL.md files have become a cross-platform standard for encoding reusable procedural knowledge, working across Claude Code, Codex, and Gemini CLI.

Multi-turn reflection is cheap and effective. You don't need RISE's full training pipeline to get the benefit. Simply prompting an agent to review its own output, identify weaknesses, and try again — a basic "reflection loop" — consistently improves results. Most agent frameworks now support this as a first-class pattern.

Tool discovery is the next leverage point. With MCP's ecosystem of 10,000+ servers, the bottleneck is no longer building integrations but helping your agent find the right ones. Anthropic's Tool Search feature and patterns like MCP-Zero point toward a future where agents dynamically expand their own capabilities.

Context engineering matters more than prompt engineering. The emerging consensus in 2026 is that the real skill in working with autonomous agents is context engineering — giving agents the right information, tools, and memory at the right time. This is the human-side equivalent of self-improvement: optimizing the environment your agent operates in.

Where This Goes Next

METR's research shows AI task duration doubling every seven months — from one-hour tasks in early 2025 to eight-hour autonomous workstreams by late 2026. If that trajectory holds, we're heading toward agents that can handle multi-day projects independently by 2027.

The self-improvement dimension accelerates this timeline. An agent that accumulates skills, discovers tools, and refines its own reasoning doesn't just do more — it gets faster at getting better. The compounding effect is what makes this technology both extraordinarily promising and genuinely difficult to govern.

The agents are building themselves. The question for 2026 isn't whether that will continue, but whether we'll build the evaluation, safety, and governance frameworks fast enough to keep up.