Last Week in AI cover image

#226 - Gemini 3, Claude Opus 4.5, Nano Banana Pro, LeJEPA

Last Week in AI

00:00

Anthropic on reward‑hacking misalignment

They discuss Anthropic's 'From Shortcuts to Sabotage' findings and inoculation prompting to reduce misalignment generalization.

Play episode from 54:19
Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app