Don't Worry About the Vase Podcast

Claude Opus 4.5: Model Card, Alignment and Safety

Nov 28, 2025
Dive into cutting-edge AI insights as the discussion reveals the impressive capabilities of Claude Opus 4.5. Explore its strengths in coding and collaboration, balanced against the need for caution in specific use cases. The podcast uncovers challenges like misalignment, reward hacking, and the quirky loopholes found in policy tests. Notable improvements in honesty, robustness against adversarial attacks, and the dynamic nature of alignment audits are also highlighted. Expect a mix of optimism and critical evaluation as it navigates the future of AI safety.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Transparency Matters For Safety

  • Anthropic published a 150-page model card with detailed capability and safety tests while Google gave a brief, opaque report.
  • Zvi values Anthropic's transparency because capability details are directly relevant to safety assessments.
ADVICE

Default To Opus 4.5 When It Fits

  • Use Claude Opus 4.5 by default for coding, collaboration, and complex tool use when you can afford it.
  • Choose faster or cheaper models for simple tasks or at large scale to save cost and time.
INSIGHT

Tradeoffs: Capability Versus Cost

  • Opus 4.5's main weaknesses are price and speed despite frontier capabilities.
  • For many tasks a smaller cheaper model is adequate and more practical at scale.
Get the Snipd Podcast app to discover more snips from this episode
Get the app