Goodfire raised 50 million recently to advance its goal of breaking open the black box...Do you think there's a chance of that happening by 2027 (Amodei's stated goal/timeline)?
Great question, Keyur - I’m cautiously optimistic, but with caveats. Cracking the black box by 2027 is ambitious. Here’s where my skepticism comes in:
- Scale is the enemy. Most interpretability wins are still happening on toy models, while frontier systems keep ballooning in size and complexity.
- We lack generalizable tools. Most mechinterp breakthroughs are bespoke. The insights are brilliant but handcrafted. It’s still brain surgery with a flashlight and tweezers.
- “Understanding” is a moving goalpost. Do we want causal traceability? Human-readable reasoning? Predictive auditability? Right now, it’s all of the above and that makes defining success (or progress) tricky.
- Incentives still favor capability over clarity, though this may shift as regulation, safety, and trust become strategic priorities.
That said, the momentum is real. Startups like Goodfire, teams at Anthropic, and increasing policy interest in “model provenance” all point in the right direction. Even if we don’t fully break open the box by 2027, I’m optimistic we’ll have powerful tools to light up circuits, isolate failure modes, and build more transparent, trustworthy systems. Call it partial x-ray vision - not full interpretability, but enough to see inside when it matters.
Goodfire raised 50 million recently to advance its goal of breaking open the black box...Do you think there's a chance of that happening by 2027 (Amodei's stated goal/timeline)?
Great question, Keyur - I’m cautiously optimistic, but with caveats. Cracking the black box by 2027 is ambitious. Here’s where my skepticism comes in:
- Scale is the enemy. Most interpretability wins are still happening on toy models, while frontier systems keep ballooning in size and complexity.
- We lack generalizable tools. Most mechinterp breakthroughs are bespoke. The insights are brilliant but handcrafted. It’s still brain surgery with a flashlight and tweezers.
- “Understanding” is a moving goalpost. Do we want causal traceability? Human-readable reasoning? Predictive auditability? Right now, it’s all of the above and that makes defining success (or progress) tricky.
- Incentives still favor capability over clarity, though this may shift as regulation, safety, and trust become strategic priorities.
That said, the momentum is real. Startups like Goodfire, teams at Anthropic, and increasing policy interest in “model provenance” all point in the right direction. Even if we don’t fully break open the box by 2027, I’m optimistic we’ll have powerful tools to light up circuits, isolate failure modes, and build more transparent, trustworthy systems. Call it partial x-ray vision - not full interpretability, but enough to see inside when it matters.