The Inside View

Ethan Perez–Inverse Scaling, Language Feedback, Red Teaming

57 snips
Aug 24, 2022
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Inverse Scaling Reveals Alignment Failures

  • The Inverse Scaling Prize identifies tasks where larger language models perform worse, exposing alignment failures.
  • These failures often arise because models amplify undesirable patterns present in their training data.
INSIGHT

Scaling Exposes Misalignment Early

  • Alignment failures worsen as models become more capable, revealing misalignment early.
  • Tracking loss and behavior trends during scaling can predict these failures before deployment.
ADVICE

Submit to Inverse Scaling Prize

  • Submit tasks demonstrating inverse scaling to compete for up to $100k in the Inverse Scaling Prize.
  • Ensure tasks are important, clearly demonstrate inverse scaling, and use multiple models.
Get the Snipd Podcast app to discover more snips from this episode
Get the app