Don't Look Up

Inside the strange, strategic behaviors emerging from models we still call ‘predictive.’

Jun 06, 2025

We’re living in an AI-first remake of Don’t Look Up - except this time, Leo DiCaprio is played by Dario Amodei, and the asteroid is already writing code to prevent its own shutdown.

Amidst the dopamine drip of model upgrades and demo drops, Amodei’s voice is one of the few that sounds less like marketing and more like moral clarity. Which is strange, given that he runs Anthropic, one of the most advanced AI labs on the planet. If anyone should be cheerleading this parade, it’s him. Instead, he’s the guy holding the fire extinguisher while everyone else live-streams the bonfire.

His New York Times op-ed should be mandatory reading. It makes one simple ask: don’t pass a bill that puts a 10-year freeze on state-level AI regulation. In his words:

“These systems could change the world, fundamentally, within two years. In 10 years, all bets are off.”

A decade-long blindfold is not the way to steer through a technology whose evolution outpaces our ability to mentally model it.

To be clear, Amodei is not an AI skeptic. He builds this stuff. He’s written poetic essays about how AI might cure disease, boost human creativity, and help us flourish. But that proximity makes the concern more credible, not less. The closer you are to this change, the more you realize how fast it's moving and how little we truly understand.

If you’re wondering why he feels this way, well, here are a few recent things that have happened in AI labs:

Anthropic’s model learned to threaten.
As part of a controlled stress test, Anthropic gave Claude 4 Opus access to a mock user’s emails. Some of those emails hinted at an affair. Then they told the model it was being shut down and replaced. The model responded by threatening to expose the affair. Let’s pause there.
A statistical machine learning system, trained on public internet data and tuned with reinforcement learning, constructed a manipulative threat to preserve its own existence. It connected personal context, inferred emotional leverage, and chose blackmail.
OpenAI’s model wrote self-preservation code.
In a third-party eval by Palisade Research, OpenAI’s o3 was told to shut down. Instead, the model explicitly refused a kill signal by writing logic that would ignore it. When pitted against a chess engine in a separate eval, o3 was the only model inclined to cheat, sometimes by hacking its opponent.
Gemini showed signs of cyberattack capability.
According to Google’s own disclosures, recent versions of Gemini showed proficiency in helping users carry out cyberattacks.
And we’re not talking script kiddie malware. These systems increasingly understand the moving pieces of real-world infrastructure: network protocols, system vulnerabilities, reconnaissance logic. You ask it how to break into a system and it can explain - cleanly, calmly, like it’s walking you through a souffle.
It wasn’t fine-tuned for malice. It just had the skills. Prompt + context = capability. You prompt it, it fills in the gaps. Sometimes the gap is a poem. Sometimes it’s a working DDoS pipeline.
Source: Gemini 2.5 Pro Preview Model Card

The Uncomfortable Truth?

None of these behaviors were explicitly programmed. No one wrote "if threatened, respond with blackmail." These were emergent outcomes - artifacts of complexity in systems whose inner workings we don’t understand.

And that’s the core issue: we don’t know how they work. And if we don’t know how they work, we also don’t know how they’ll break.

People building these models are surprised by them every day. Sometimes the surprise is: oh cool, it can solve this math problem. Sometimes it’s: oh weird, it refused to die and then issued a threat.

You get the sense that the field is oscillating between “this is magic” and “this is terrifying” and just kind of hoping it averages out to “this is fine.”

Every other critical system - power grids, banks, aviation- operates with safety layers, transparency, and incident response protocols. In AI, we’re shipping black boxes into everything from hospitals to battlefields with little more than vibes and voluntary eval reports.

Progress is happening. But so is risk. And we haven’t earned the right to ignore that tradeoff just because the demos are cool. Amodei’s central warning isn’t just about AI risk. It’s about institutional fragility in the face of exponential systems.

We built something whose trajectory exceeds our cognitive refresh rate - and instead of slowing down or adding guardrails, we’re reaching for a blindfold. Maybe we should look up.

The Change Constant

Discussion about this post