The Capitalist's Conscience

Anthropic says AI is starting to build better AI, and the only brake may be mutually assured restraint

Jun 08, 2026

Mutually Assured Destruction / Restraint

Anthropic has a habit of periodically walking into the middle of the AI party, turning down the music, and reminding everyone that the building may be on fire.

Its latest essay, “When AI Builds Itself,” is very much in that tradition. It is long, earnest, alarming, and almost guaranteed to be interpreted through whatever lens people already use to view Anthropic.

To the believers, this is Anthropic doing what it has always claimed to do: sounding the alarm from inside the machine room.

To the skeptics, it is another piece of exquisitely calibrated safety theater: warn about the most extreme version of the future, make the market opportunity look civilization-sized, and position yourself as the only responsible adult in a room full of reckless accelerationists.

There is plenty of material for the skeptics. Anthropic is a frontier AI company warning that frontier AI companies are moving too fast. It is arguing for restraint while competing aggressively. It recently declined to broadly release Mythos Preview, a decision supporters saw as responsible and critics saw as buzz-maxxing. The most powerful model is always more interesting when almost nobody can use it.

And yet, pure cynicism feels too easy here.

Maybe this will make me look naive in hindsight, but I do think there is a real ring of sincerity in Anthropic’s public discomfort. The posture is almost comically awkward: “We cannot stop unless you stop, but we think both of us should stop.” It sounds absurd. It is also basic game theory.

A unilateral pause by Anthropic would mostly be a transfer of advantage to OpenAI, Google, xAI, Meta, or some state-backed lab elsewhere. A coordinated pause, meanwhile, would require verification, trust, enforcement, shared definitions, shared triggers, and some way of knowing that nobody is secretly training the next model in a desert data center while everyone else behaves nicely.

Anthropic cannot afford to say “we are opting out of the race.” So it is saying “we are trapped in a race that requires intervention by institutions capable of making races pause.”

The core argument of the essay is that AI is no longer just changing how people work, but how AI itself gets built. Anthropic lays out the progression as a kind of corporate evolution chart.

In the early days, humans wrote code and documents on laptops. Then chatbots helped generate snippets. Then coding agents began editing entire files. Today, agents can run code, delegate work to other agents, and complete multi-hour tasks. The next step, if the curve continues, is what Anthropic calls “closing the loop”: AI systems capable of designing, training, testing, and improving their own successors.

This is recursive self-improvement, the old science-fiction idea that an intelligent system could improve itself, then use the improved version to improve itself again, and so on. The phrase has historically lived somewhere between LessWrong, Bostrom, and late-night existential panic. Anthropic is trying to drag it into the operating plan.

And they come bearing evidence:

As of May 2026, 80%+ of the code merged into Anthropic’s production codebase was authored by Claude. Before Claude Code launched in Feb-25, that number was in the low single digits. The typical Anthropic engineer now merges 8x as much code per day as in 2024.
In April, Claude reportedly shipped more than 800 fixes that reduced a class of API errors by a factor of 1000. The engineer overseeing the work estimated it would have taken a human 4 years to complete.
Claude’s success rate on open-ended coding tasks reached 76% in May 2026, up 50 points in 6 months. In one example, after a routine upgrade began crashing tens of thousands of training jobs, an engineer pointed Claude at the live incident with little more than text context and cluster access. Claude isolated an obscure debugging flag, reproduced the issue, and confirmed a fix in about two hours. Normally, that would have been 2-3 days of work.

That is the first transition: from labor to leverage.

The second transition is more important: from execution to judgment.

In an internal test, Anthropic took 129 real research moments where a human had gone somewhat off-course. It then showed different Claude models the session up to that point and asked what they would do next. A separate judge compared the model’s proposed next step against the human’s actual next step. In November 2025, Claude Opus 4.5 beat the human choice 51% of the time. By April 2026, Mythos Preview beat the human choice 64% of the time.

This is not a clean “AI beats researchers” result (yet). Anthropic picked moments where the human choice had room for improvement. The test is narrow. The judge was itself a Claude model. There are enough caveats here to keep a methodologist busy for a weekend.

Still, the result matters. Research is not mostly grand eureka moments. It is a long chain of next-step decisions. Try this. Ignore that. Rerun the experiment. Check the data. Change the optimizer. Kill the project. Double down. Most scientific and engineering progress is not cinematic inspiration. It is judgment applied repeatedly to friction. And the judgment appears to be improving quickly.

The essay then goes on to sketch three possible futures.

In the first, the trend stalls. Today’s models diffuse widely, but the curve bends. Maybe research judgment is not easily learned through scaling. Maybe compute, energy, chips, interconnects, or data become hard constraints. Maybe the whole thing turns into an S-curve and society gets more time to adapt. Anthropic says this is possible, but does not think it is likely.
In the second future, AI labs continue to see compounding efficiency gains, but humans still set direction. This is already dramatic. Anthropic imagines 100-person companies doing the work of 10,000 or 100,000-person organizations. That is the “pyramid of agents” world: every employee becomes a manager of synthetic labor, and every organization becomes less constrained by headcount.
This future is transformational enough. It changes software, services, security, government operations, influence campaigns, authoritarian surveillance, and probably the structure of the firm itself.
In the third future, the loop closes. AI systems become capable of full recursive self-improvement. Humans move from builders to overseers of a virtual lab. The pace of progress becomes determined less by human research labor and more by compute. This is where Anthropic’s tone changes from “interesting productivity uplift” to “arms-control problem.”

They are not arguing that full recursive self-improvement is here, but that the evidence points far enough in that direction that waiting for proof may be a terrible strategy. By the time everyone agrees the loop has closed, the institutions needed to govern it will already be obsolete.

So what are they advocating for? The option to slow or temporarily pause frontier AI development. Crucially, they want that option to be verifiable and coordinated across frontier labs and countries. Think mutual assured destruction as deterrence.

This future may not arrive. The curve may bend. The physical world may impose friction. Society may remain stubbornly slow. As Anthropic notes, more intelligence cannot make a clinical trial reveal ten-year side effects in a week, hold elections sooner than a constitution allows, or turn a stranger into an old friend over a weekend. The lab can move at compute speed but the world still moves at human speed.

But that mismatch is exactly the problem. If recursive intelligence accelerates upstream while institutions remain slow downstream, the risk is that our ability to understand, govern, and absorb it falls further behind every month.

You can think Anthropic is self-interested and still think this is true. You can roll your eyes at the theatrical caution and still notice that the numbers are strange. You can dislike the nuclear analogy and still admit that verification is the core issue. You can believe the market will overhype this and still believe the underlying curve is moving faster than society can process. That is what makes the essay interesting.

The Change Constant

Discussion about this post

Ready for more?