Last Week in AI cover image

#219 - GPT 5, Opus 4.1, OpenAI's Open Source, Astrocade

Last Week in AI

00:00

Persona Vectors Monitor And Harden Model Traits

  • Anthropic defines 'persona vectors' to monitor and control model character traits like sarcasm or dishonesty.
  • They train models to be robust even when an 'evil' activation is injected into internal activations.
Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app