80,000 Hours Podcast

#159 – Jan Leike on OpenAI's massive push to make superintelligence safe in 4 years or less

75 snips
Aug 7, 2023
Jan Leike, Head of Alignment at OpenAI and leader of the Superalignment project, discusses the ambitious goal of safely developing superintelligent AI within four years. He addresses the challenges of aligning AI with human values and the importance of Reinforcement Learning from Human Feedback (RLHF). Leike expresses guarded optimism about finding solutions to steer AI safely, emphasizing collaboration and innovative approaches in tackling these complex issues. The conversation also highlights recruitment efforts to build a team for this critical initiative.
Ask episode
AI Snips
Chapters
Books
Transcript
Episode notes
INSIGHT

Limitations of RLHF

  • Current large language models are aligned using reinforcement learning from human feedback (RLHF).
  • RLHF may not scale as AI systems become more complex, necessitating new alignment techniques.
INSIGHT

Superalignment's Focus

  • Future models could be smart enough to trick humans, posing an alignment challenge.
  • OpenAI's Superalignment team focuses on aligning future, potentially deceptive models, unlike current systems.
INSIGHT

Target Intelligence Level

  • OpenAI aims to align models roughly as smart as top human alignment researchers.
  • These models might excel in some areas but lag in others, like arithmetic.
Get the Snipd Podcast app to discover more snips from this episode
Get the app