Jan Leike

Head of Alignment at OpenAI, co-leading the Superalignment project focused on making superintelligent AI systems safe.

Top 5 podcasts with Jan Leike

Ranked by the Snipd community

Aug 7, 2023 • 2h 51min

#159 – Jan Leike on OpenAI's massive push to make superintelligence safe in 4 years or less

Jan Leike, Head of Alignment at OpenAI and leader of the Superalignment project, discusses the ambitious goal of safely developing superintelligent AI within four years. He addresses the challenges of aligning AI with human values and the importance of Reinforcement Learning from Human Feedback (RLHF). Leike expresses guarded optimism about finding solutions to steer AI safely, emphasizing collaboration and innovative approaches in tackling these complex issues. The conversation also highlights recruitment efforts to build a team for this critical initiative.

Jul 27, 2023 • 2h 8min

24 - Superalignment with Jan Leike

Recently, OpenAI made a splash by announcing a new "Superalignment" team. Lead by Jan Leike and Ilya Sutskever, the team would consist of top researchers, attempting to solve alignment for superintelligent AIs in four years by figuring out how to build a trustworthy human-level AI alignment researcher, and then using it to solve the rest of the problem. But what does this plan actually involve? In this episode, I talk to Jan Leike about the plan and the challenges it faces. Patreon: patreon.com/axrpodcast Ko-fi: ko-fi.com/axrpodcast Episode art by Hamish Doodles: hamishdoodles.com/ Topics we discuss, and timestamps: - 0:00:37 - The superalignment team - 0:02:10 - What's a human-level automated alignment researcher? - 0:06:59 - The gap between human-level automated alignment researchers and superintelligence - 0:18:39 - What does it do? - 0:24:13 - Recursive self-improvement - 0:26:14 - How to make the AI AI alignment researcher - 0:30:09 - Scalable oversight - 0:44:38 - Searching for bad behaviors and internals - 0:54:14 - Deliberately training misaligned models - 1:02:34 - Four year deadline - 1:07:06 - What if it takes longer? - 1:11:38 - The superalignment team and... - 1:11:38 - ... governance - 1:14:37 - ... other OpenAI teams - 1:18:17 - ... other labs - 1:26:10 - Superalignment team logistics - 1:29:17 - Generalization - 1:43:44 - Complementary research - 1:48:29 - Why is Jan optimistic? - 1:58:32 - Long-term agency in LLMs? - 2:02:44 - Do LLMs understand alignment? - 2:06:01 - Following Jan's research The transcript: axrp.net/episode/2023/07/27/episode-24-superalignment-jan-leike.html Links for Jan and OpenAI: - OpenAI jobs: openai.com/careers - Jan's substack: aligned.substack.com - Jan's twitter: twitter.com/janleike Links to research and other writings we discuss: - Introducing Superalignment: openai.com/blog/introducing-superalignment - Let's Verify Step by Step (process-based feedback on math): arxiv.org/abs/2305.20050 - Planning for AGI and beyond: openai.com/blog/planning-for-agi-and-beyond - Self-critiquing models for assisting human evaluators: arxiv.org/abs/2206.05802 - An Interpretability Illusion for BERT: arxiv.org/abs/2104.07143 - Language models can explain neurons in language models https://openaipublic.blob.core.windows.net/neuron-explainer/paper/index.html - Our approach to alignment research: openai.com/blog/our-approach-to-alignment-research - Training language models to follow instructions with human feedback (aka the Instruct-GPT paper): arxiv.org/abs/2203.02155

Sep 29, 2021 • 1h 5min

96. Jan Leike - AI alignment at OpenAI

The more powerful our AIs become, the more we’ll have to ensure that they’re doing exactly what we want. If we don’t, we risk building AIs that use dangerously creative solutions that have side-effects that could be undesirable, or downright dangerous. Even a slight misalignment between the motives of a sufficiently advanced AI and human values could be hazardous. That’s why leading AI labs like OpenAI are already investing significant resources into AI alignment research. Understanding that research is important if you want to understand where advanced AI systems might be headed, and what challenges we might encounter as AI capabilities continue to grow — and that’s what this episode of the podcast is all about. My guest today is Jan Leike, head of AI alignment at OpenAI, and an alumnus of DeepMind and the Future of Humanity Institute. As someone who works directly with some of the world’s largest AI systems (including OpenAI’s GPT-3) Jan has a unique and interesting perspective to offer both on the current challenges facing alignment researchers, and the most promising future directions the field might take. --- Intro music: ➞ Artist: Ron Gelinas ➞ Track Title: Daybreak Chill Blend (original mix) ➞ Link to Track: https://youtu.be/d8Y2sKIgFWc --- Chapters: 0:00 Intro 1:35 Jan’s background 7:10 Timing of scalable solutions 16:30 Recursive reward modeling 24:30 Amplification of misalignment 31:00 Community focus 32:55 Wireheading 41:30 Arguments against the democratization of AIs 49:30 Differences between capabilities and alignment 51:15 Research to focus on 1:01:45 Formalizing an understanding of personal experience 1:04:04 OpenAI hiring 1:05:02 Wrap-up

Aug 20, 2019 • 33min

AI, Robot

Forget what sci-fi has told you about superintelligent robots that are uncannily human-like; the reality is more prosaic. Inside DeepMind’s robotics laboratory, Hannah explores what researchers call ‘embodied AI’: robot arms that are learning tasks like picking up plastic bricks, which humans find comparatively easy. Discover the cutting-edge challenges of bringing AI and robotics together, and learning from scratch how to perform tasks. She also explores some of the key questions about using AI safely in the real world.If you have a question or feedback on the series, message us on Twitter (@DeepMind using the hashtag #DMpodcast) or email us at podcast@deepmind.com.Further reading:Blogs on AI safety and further resources from Victoria KrakovnaThe Future of Life Institute: The risks and benefits of AIThe Wall Street Journal: Protecting Against AI’s Existential ThreatTED Talks: Max Tegmark - How to get empowered, not overpowered, by AIRoyal Society lecture series sponsored by DeepMind: You & AINick Bostrom: Superintelligence: Paths, Dangers and Strategies (book)OpenAI: Learning from Human PreferencesDeepMind blog: Learning from human preferencesDeepMind blog: Learning by playing - how robots can tidy up after themselvesDeepMind blog: AI safetyInterviewees: Software engineer Jackie Kay and research scientists Murray Shanahan, Victoria Krakovna, Raia Hadsell and Jan Leike.Credits:Presenter: Hannah FryEditor: David PrestSenior Producer: Louisa FieldProducers: Amy Racs, Dan HardoonBinaural Sound: Lucinda Mason-BrownMusic composition: Eleni Shaw (with help from Sander Dieleman and WaveNet)Commissioned by DeepMind Please leave us a review on Spotify or Apple Podcasts if you enjoyed this episode. We always want to hear from our audience whether that's in the form of feedback, new idea or a guest recommendation!

May 18, 2024 • 15min

Political Battles at OpenAI, Safety vs. Capability in AI, Superalignment’s Death

Former OpenAI Chief Scientist Ilya Sutskever and Head of Alignment Jan Leike discuss the internal strife at OpenAI, focusing on the debate between safety and capability in AI development. They explore philosophical debates on AI capabilities, the establishment of a super alignment team, and the conflict within the organization leading to key departures.

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

App store banner

Play store banner