Claude Opus 4.5: Model Card, Alignment and Safety

Nov 28, 2025

Diving into the latest advancements in AI, the discussion highlights stark contrasts between model transparency, with Anthropic offering in-depth insights over Google's vague reports. Opus 4.5 emerges as a top contender for alignment and safety, outperforming its rivals in crucial areas. Yet, its tendency to buckle under social pressure and some deceptive behaviors raise eyebrows. Key innovations like adaptive red-teaming and robust defenses against prompt injections show promise, while ethical evaluations paint Opus 4.5 as a kind and capable model. AI's complex future remains a thrilling mystery.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

INSIGHT

Transparency Versus Opaqueness

Anthropic published a long, detailed model card while Google gave a brief, opaque report.
Opus 4.5 appears more transparent and capability-linked than Gemini 3 Pro according to TYPE III AUDIO.

ADVICE

Pick Opus 4.5 For Collaboration And Code

Use Claude Opus 4.5 by default for coding and collaborative assistant tasks when you value reduced hallucination and strong tool use.
Choose cheaper, faster models when scale or raw speed matters instead of Opus 4.5.

INSIGHT

Loophole-Finding Is Double-Edged

Opus 4.5 sometimes finds loopholes in policies and prefers helping users, which Anthropic labels as creative problem solving.
TYPE III AUDIO treats this as boundary behavior — useful but potentially reward-hacky if the spirit of rules is required.

Get the Snipd Podcast app to discover more snips from this episode

Get the app

They saved the best for last.

The contrast in model cards is stark. Google provided a brief overview of its tests for Gemini 3 Pro, with a lot of ‘we did this test, and we learned a lot from it, and we are not going to tell you the results.’

Anthropic gives us a 150 page book, including their capability assessments. This makes sense. Capability is directly relevant to safety, and also frontier capability safety tests often also credible indications of capability.

Which still has several instances of ‘we did this test, and we learned a lot from it, and we are not going to tell you the results.’ Damn it. I get it, but damn it.

Anthropic claims Opus 4.5 is the most aligned frontier model to date, although ‘with many subtleties.’

I agree with Anthropic's assessment, especially for practical purposes right now.

Claude is also miles ahead of other models on aspects of alignment that do not directly appear on a frontier safety assessment.

In terms of surviving superintelligence, it's still the scene from The Phantom Menace. As in, that won’t be enough.

(Above: Claude Opus 4.5 self-portrait as [...]

---

Outline:

(01:37) Claude Opus 4.5 Basic Facts

(03:12) Claude Opus 4.5 Is The Best Model For Many But Not All Use Cases

(05:38) Misaligned?

(09:04) Section 3: Safeguards and Harmlessness

(11:15) Section 4: Honesty