Two Theories of Safety
There's a video making the rounds that I want to share with you, not because it's a traditional argument, but because it shows what argument looks like when it happens in the world rather than on the page. The video compares the two dominant AI companies—OpenAI and Anthropic—by tracing their divergent strategies back to their founders' core beliefs. It's a case study in how principles shape outcomes, and why the same word ("safety") can mean completely different things depending on your underlying theory of how the world works.
The Setup
The standard narrative about AI companies goes like this: some care about safety and some don't. OpenAI moves fast and breaks things; Anthropic moves cautiously. One is reckless, the other responsible. But Nate B Jones argues this framing is "a gross oversimplification." Both Sam Altman and Dario Amodei believe safety matters. What they disagree about is how you achieve it.
This is not a story about one company being reckless and one being cautious. Both Sam Altman and Dario Amodei believe safety matters. They just have fundamentally different theories about how you achieve it.
— Nate B Jones
That word "theories" is doing important work. We're not talking about preferences or attitudes. We're talking about different answers to an epistemological question: How do you know whether something is safe?
Two Answers
Sam Altman's answer comes from Y Combinator, the startup accelerator he led before running OpenAI. At YC, the core lesson is that you cannot theorize your way to a good product. You have to ship. Real users reveal problems that no amount of internal testing can predict. The market teaches you what you cannot learn otherwise.
Applied to AI safety, this becomes: you learn what's safe by deploying. Put the system in front of millions of users. Let them find the failure modes. Iterate rapidly. Safety emerges from the feedback loop.
The best way to make an AI system safe is by iteratively and gradually releasing it into the world, giving society time to adapt and co-evolve with technology.
— Sam Altman
Dario Amodei's answer comes from a different background—physics, computational neuroscience, years of research before touching AI. For Amodei, safety is not something that emerges from deployment. It's a precondition for deployment. You demonstrate safety affirmatively before you scale.
This isn't just caution. It's a different epistemology. Where Altman asks "what will users tell us?", Amodei asks "what can we prove?" Anthropic built AI Safety Levels modeled on biosafety protocols for handling pathogens. At ASL-3, systems that could meaningfully assist in creating bioweapons require proof of no meaningful catastrophic misuse risk before deployment. The standard isn't "we tested it and didn't find problems." It's "we can demonstrate it's safe."
Why This Matters for the Course
I'm sharing this video because it demonstrates something central to what we're doing this semester: arguments happen through principles, and competing principles produce genuinely different conclusions even when everyone agrees on the goal.
Both founders believe AI should be safe. Neither is lying about this. But they hold different theories about how safety is achieved—and those theories trace back to their backgrounds, their training, what they learned before they ever touched artificial intelligence. Altman learned at YC that the market reveals truth. Amodei learned in the lab that you prove things before you release them. Same goal, different epistemologies, radically different companies.
This is what argument from principle looks like in practice. It's not a debate where one side is pro-safety and one is anti-safety. It's a disagreement about how the world works, rooted in different experiences that made different beliefs seem obvious. And because the beliefs are about how knowledge is acquired—how you learn whether something is safe—the disagreement can't be resolved by citing more facts. It requires examining the underlying theories themselves.
Picking Sides
Jones explicitly refuses to declare a winner, and I think that's the wrong move. Not because one company is clearly right and the other clearly wrong, but because refusing to choose is itself a choice—a way of avoiding the argument rather than engaging with it.
Here's how I'd think about picking a side: What do you believe about feedback loops and scale? If you think real-world deployment is the only reliable source of information about complex systems—if you believe that internal testing will always miss what millions of users discover—then Altman's approach follows logically. Ship, learn, iterate. The public is your red team because the public reveals what you cannot predict.
But if you think there are categories of harm that shouldn't be discovered through deployment—if you believe that some failure modes are catastrophic enough that finding them in production is unacceptable—then Amodei's approach follows logically. Prove safety first. The standard should be affirmative demonstration, not absence of discovered problems.
Your answer depends on what you believe about the nature of AI risk and the reliability of pre-deployment testing. Neither answer is stupid. But they are incompatible, and at some point you have to commit.
The Argument That Isn't Written
What I find most valuable about this video is that it shows argument happening outside the essay form. Jones is making claims, providing evidence, developing reasons—but he's doing it in a 15-minute YouTube video, not a peer-reviewed paper. The argument is embodied in how two companies actually operate, traced back to the biographies and beliefs of their founders.
This is the kind of argument that shapes the world more than most academic papers do. The principles at stake will determine how AI develops over the next decade. And the people making the argument aren't writing essays—they're building companies, shipping products, making millions of decisions that encode their theories into reality.
Learning to see this kind of argument—to trace outcomes back to principles, to understand how competing theories produce competing actions—is exactly what I hope you'll develop this semester. The essays you write should work the same way: claims that rest on deeper principles, principles that can be articulated and defended and challenged. Not just "I think X," but "I think X because I believe Y about how the world works."
Watch the video. Pay attention to how Jones constructs his case—how he uses biography as evidence for intellectual commitments, how he traces company strategy back to founder epistemology. Then ask yourself: which theory of safety do you find more compelling, and why?