LLMs, Mainframes, and the Iron Man Suit: The Decade of Augmented AI

At AI Startup School SF, Karpathy laid out the real work ahead for AI builders.

Jun 20, 2025

There’s a quiet superpower in tech (and life): knowing who’s worth listening to.

The loudest voices are often selling you something - a demo, a product, a future that’s always conveniently “just around the corner.” But once in a while, someone stands at the intersection of cutting-edge research, real-world deployment, and sober reflection. That person is worth tuning into.

Andrej Karpathy is one of them.

He led AI at Tesla, helped build the early GPT models as part of OpenAI’s founding team, earned his PhD at Stanford under Fei-Fei Li, and still teaches thousands through CS231n. If that résumé doesn’t impress you, he also coined the term “vibe coding.”

So when Karpathy took the stage at YC AI Startup School in San Francisco, I paid attention. And here’s what I took away:

(1) Software 3.0: English as Code

Karpathy reframes software’s evolution in three eras:

Software 1.0: You hand-code logic in a programming language.
Software 2.0: You train models; neural net weights are the program.
Software 3.0: You program in English. Prompts are the code.

The boundary between “user” and “developer” has collapsed. Everyone who can write a clear sentence is, in theory, a coder now.

(2) LLMs aren’t Utilities - they’re Operating Systems

Karpathy’s most powerful framework: we’re in the mainframe era of AI.

What was the 1960s OS world like?

Expensive, centralized compute. Mainframes cost millions; few owned them. People didn’t have personal computers - they connected as thin clients (terminals) to shared resources.
Time-sharing. Multiple users shared the machine’s compute in slices. Your “job” got batched alongside others.
Command-line interfaces. No graphical UI - just text-based terminals (like ChatGPT today).
Cloud-like model. The computer lived in a data center; you interacted remotely over a network.

Now look at LLMs:

LLMs are massive and expensive to build/train. Just like mainframes, LLMs require huge CAPEX (100K GPUs = xAI’s “Colossus” cluster; billions in training cost).
Centralized + cloud-native. Nobody has these models locally (with rare exceptions). We access them as API calls over the internet - modern time-sharing.
Thin client users. Our laptops, phones, even our browsers, are just terminals piping requests to cloud LLMs.
Terminal-style interaction. The primary interface? Text chat. No general-purpose GUI for AI yet - no “desktop metaphor” equivalent for LLMs.

Why does this metaphor matter? Karpathy’s point isn’t nostalgia - it’s guidance.

We’re at the pre-personal computer phase of LLMs. Just like the 1960s needed the GUI, the mouse, and cheap hardware to spark the PC revolution - we need breakthroughs (in cost, interface, hardware) for personal LLM compute.
Design differently. If LLMs are OSes, not utilities:
- Think about multi-layered architectures: kernel/user space analogies → LLM context, memory, and tool orchestration.
- Recognize switching costs. Just as apps behave differently on Windows vs. Mac, different LLMs have style/capability tradeoffs.
Opportunity to invent the next layer.
- Who builds the “Windows” or “MacOS” for LLMs?
- What’s the AI GUI equivalent - the thing that replaces the text terminal with an intuitive, flexible, powerful interface?
Personal LLM compute is coming, but not here yet. Just like minicomputers and PCs eventually shrank mainframes, local LLMs will happen but cost and tech constraints mean we’re not there.

My takeaway: Study the history of OS evolution: time-sharing → personal computing → cloud OS. The AI ecosystem will rhyme. Accept current constraints (centralization, batching) but design for future decentralization.

(3) Partial Autonomy + The Autonomy Slider

Karpathy’s Tesla experience taught him what happens between flashy demos and reliable autonomy: a decade of boring, hard work.

In 2013, he rode in a Waymo car that handled 30 minutes of Palo Alto driving perfectly. The demo worked.
It’s 2025. We’re still debugging self-driving at scale.

The same is true for AI agents. The opportunity is augmenting people with AI “Iron Man suits,” not replacing them with Iron Man robots. Cursor, Perplexity are early examples of where this is going.

They package context, orchestrate multiple LLM calls, and give users GUIs to audit AI output.
They offer an autonomy slider - letting humans choose how much control to give up.

The future is co-pilot software - where humans steer, AI assists, and the feedback loop is fast.

(4) Docs and infra need to meet AI halfway

Today’s software is built for humans and APIs. Tomorrow’s needs to be legible to agents:

Ditch “click here.” Use curl.
Replace pretty PDFs with agent-friendly Markdown.
Build tooling that packages context so LLMs don’t fumble their way through HTML and menus.

We need to design for a new consumer: not just people, not just code, but people-like machines.

If you’re building in AI, here’s your checklist

✅ Treat LLMs as OSes - design for orchestration, tools, and interfaces, not just API calls.
✅ Build autonomy sliders - let users choose the level of AI control.
✅ Prioritize GUIs + fast generation-verification loops - that’s where trust and speed come from.
✅ Invent the AI GUI layer - it doesn’t exist yet. Big prizes await.
✅ Stop falling for demos - “works.any()” isn’t “works.all().

Karpathy’s message is clear:

We’re in the mainframe era of AI. The personal computing revolution for LLMs hasn’t happened but it will. Your job is to build what comes between.

If that doesn’t get you excited to roll up your sleeves, I don’t know what will.

Note: I’ve included a few slide screenshots in this piece, but I highly recommend exploring Karpathy’s full deck here and and watching the complete talk here. It’s well worth your time.

The Change Constant

Discussion about this post