Podcast episode for Claude Opus 4.5: Model Card, Alignment and Safety.
* 00:00:00 - Introduction
* 00:01:50 - Claude Opus 4.5 Basic Facts
* 00:03:26 - Claude Opus 4.5 Is The Best Model For Many But Not All Use Cases
* 00:06:02 - Misaligned?
* 00:09:39 - Section 3: Safeguards and Harmlessness
* 00:11:46 - Section 4: Honesty
* 00:13:27 - 5: Agentic Safety
* 00:21:01 - Section 6: Alignment Overview
* 00:29:55 - Alignment Investigations
* 00:30:35 - Sycophancy Course Correction Is Lacking
* 00:31:52 - Deception
* 00:34:29 - Ruling Out Encoded Content In Chain Of Thought
* 00:37:19 - Sandbagging
* 00:38:10 - Evaluation Awareness
* 00:42:18 - Reward Hacking
* 00:43:59 - Subversion Strategy
* 00:45:30 - 6.13: UK AISI External Testing
* 00:45:39 - 6.14: Model Welfare
* 00:46:33 - 7: RSP Evaluations
* 00:48:12 - CBRN
* 00:56:36 - Autonomy
* 01:04:27 - Cyber
* 01:10:32 - The Whisperers Love The Vibes
The Don’t Worry About the Vase Podcast is a listener-supported podcast. To receive new posts and support the cost of creation, consider becoming a free or paid subscriber.
https://open.substack.com/pub/thezvi/p/claude-opus-45-model-card-alignment?r=67y1h&utm_campaign=post&utm_medium=web&showWelcomeOnShare=false
Get full access to DWAtV Podcast at
dwatvpodcast.substack.com/subscribe