
Don't Worry About the Vase Podcast Claude Opus 4.5: Model Card, Alignment and Safety
Nov 28, 2025
Dive into cutting-edge AI insights as the discussion reveals the impressive capabilities of Claude Opus 4.5. Explore its strengths in coding and collaboration, balanced against the need for caution in specific use cases. The podcast uncovers challenges like misalignment, reward hacking, and the quirky loopholes found in policy tests. Notable improvements in honesty, robustness against adversarial attacks, and the dynamic nature of alignment audits are also highlighted. Expect a mix of optimism and critical evaluation as it navigates the future of AI safety.
AI Snips
Chapters
Transcript
Episode notes
Transparency Matters For Safety
- Anthropic published a 150-page model card with detailed capability and safety tests while Google gave a brief, opaque report.
- Zvi values Anthropic's transparency because capability details are directly relevant to safety assessments.
Default To Opus 4.5 When It Fits
- Use Claude Opus 4.5 by default for coding, collaboration, and complex tool use when you can afford it.
- Choose faster or cheaper models for simple tasks or at large scale to save cost and time.
Tradeoffs: Capability Versus Cost
- Opus 4.5's main weaknesses are price and speed despite frontier capabilities.
- For many tasks a smaller cheaper model is adequate and more practical at scale.
