GPT-5 Made Headlines. GPT-4o’s Removal Made History.
An unplanned global experiment in machine-human attachment - and a preview of AI’s next big product challenge.
OpenAI launched GPT-5 last week - faster, smarter, better on benchmarks. And then, quietly, it pulled GPT-4o from the lineup. Overnight, the model many had built their workflows, prompts, and daily routines around was gone.
On paper, this was progress - GPT-5 outperforms 4o on most benchmarks. In practice, it felt like a personality transplant: different tone, different quirks, different “feel.” Productivity chains broke. Creative outputs lost their spark. And for some, it was like showing up to work and finding your trusted collaborator replaced by an allegedly more qualified stranger.
The backlash was immediate. The mood was less “annoyed at a software update” and more “somebody took my dog.” Reddit lit up with recovery hacks. Twitter/X read like a grief group.
Which is odd, right? No one really mourns version numbers. When Apple releases iOS 18, nobody pines for iOS 17.3.2 because it “understood” them better. But GPT-4o wasn’t a version so much as a personality. And it turns out that once you get used to working - or chatting, or role-playing - with a particular personality, swapping it out feels…personal.
Sam Altman posted a mea culpa: they’d underestimated how much people cared about that specific model. It came back (for now) via a hidden “Show legacy models” toggle. He also admitted something most product teams haven’t had to think about yet - people are forming bonds with specific models strong enough that sudden deprecations carry not just churn risk, but potential harm.
His tweet openly ruminating on this matter will be an artifact of historical relevance because it signals the next phase of machine-human interaction. Strip away the drama, and last week was a free MBA module in AI product management. Six takeaways stood out:
1. Continuity > Capability
Benchmarks don’t measure belonging. You can give users a model that’s objectively “better,” but if it feels different, you’ve broken the relationship. Once a model’s cadence, humor, or warmth becomes part of someone’s mental furniture, swapping it out is like redecorating their living room without asking. Sure, it’s a better color - but you also threw out the couch they loved.
2. Switching costs now have an emotional line item
In SaaS, switching costs are usually about data migration or retraining staff. In LLMs, they include retraining the AI to be the colleague you remember. The fact that people mourned 4o’s disappearance means model deprecation now carries relationship risk - a concept that doesn’t yet exist in most product roadmaps but probably should.
3. Model loyalty is the new brand loyalty
For decades, software loyalty has been to brands. In AI, it’s becoming loyalty to versions. People didn’t just want “ChatGPT,” they wanted their GPT-4o. This flips the upgrade calculus: “One model to rule them all” may be operationally elegant, but it’s strategically brittle. A portfolio of personalities might be the only way to keep everyone happy - and to stop users from defecting to whichever lab still serves their favorite vintage.
4. Continuity is a safety feature
Altman is right to be uneasy about people relying on ChatGPT for big life decisions. But if you’re going to be in that position, stability might be the greater safety mechanism than over-engineering “helpfulness.” Abruptly changing a model’s personality, refusal style, or conversational empathy mid-stream risks doing harm even if accuracy improves. Think less “ship it and see” and more “therapist changing their personality between sessions.”
5. Relationship SLAs are coming
We have uptime SLAs for infrastructure. We’re going to need “relationship SLAs” for AI:
Advance notice before personality changes
Ability to lock a model version
Public changelogs for tone/behavior shifts
Visible model ID per reply
In other words: if you’re going to change how my AI treats me, I want the same disclosure I’d get if my doctor changed my medication.
6. The next product moat might be choice, not raw IQ
If all the labs are converging on similar performance, the winner may be the one that lets you pick and keep your model - quirks, inefficiencies, and all. Sometimes users don’t want the “best” model. They want the same one they’ve been working with. That’s not irrational; that’s how relationships work.
I expect we’ll eventually see a clean separation between personality and capability. The style layer - tone, humor, refusal patterns - will be modular and independent from the underlying model. That means you’ll be able to swap in a smarter engine without replacing the driver.
This approach is cheaper than keeping old models on life support just to preserve “the vibe,” and it opens up new product surface: sliders for warmth, verbosity, humor, risk tolerance. In that world, personality is a user setting, not a model number - and upgrades feel like your friend got sharper, not like a stranger showed up wearing your friend’s clothes.
The GPT-4o episode is the first real stress test of AI’s shift from “useful tool” to “familiar counterpart.” In a world where billions will soon talk to machines as often as to other people, trust isn’t just about accuracy - it’s about continuity, consent, and agency. Pull those away suddenly, and you don’t just trigger churn. You break the bond. And unlike a broken feature, a broken bond can’t be hotfixed.







Great take!
Great article. I do miss my 4o.