undefined

Jan Leike

Head of AI alignment at OpenAI, with previous experience at DeepMind and the Future of Humanity Institute.

Top 5 podcasts with Jan Leike

Ranked by the Snipd community
undefined
74 snips
Aug 7, 2023 • 2h 51min

#159 – Jan Leike on OpenAI's massive push to make superintelligence safe in 4 years or less

In July, OpenAI announced a new team and project: Superalignment. The goal is to figure out how to make superintelligent AI systems aligned and safe to use within four years, and the lab is putting a massive 20% of its computational resources behind the effort.Today's guest, Jan Leike, is Head of Alignment at OpenAI and will be co-leading the project. As OpenAI puts it, "...the vast power of superintelligence could be very dangerous, and lead to the disempowerment of humanity or even human extinction. ... Currently, we don't have a solution for steering or controlling a potentially superintelligent AI, and preventing it from going rogue."Links to learn more, summary and full transcript.Given that OpenAI is in the business of developing superintelligent AI, it sees that as a scary problem that urgently has to be fixed. So it’s not just throwing compute at the problem -- it’s also hiring dozens of scientists and engineers to build out the Superalignment team.Plenty of people are pessimistic that this can be done at all, let alone in four years. But Jan is guardedly optimistic. As he explains: Honestly, it really feels like we have a real angle of attack on the problem that we can actually iterate on... and I think it's pretty likely going to work, actually. And that's really, really wild, and it's really exciting. It's like we have this hard problem that we've been talking about for years and years and years, and now we have a real shot at actually solving it. And that'd be so good if we did.Jan thinks that this work is actually the most scientifically interesting part of machine learning. Rather than just throwing more chips and more data at a training run, this work requires actually understanding how these models work and how they think. The answers are likely to be breakthroughs on the level of solving the mysteries of the human brain.The plan, in a nutshell, is to get AI to help us solve alignment. That might sound a bit crazy -- as one person described it, “like using one fire to put out another fire.”But Jan’s thinking is this: the core problem is that AI capabilities will keep getting better and the challenge of monitoring cutting-edge models will keep getting harder, while human intelligence stays more or less the same. To have any hope of ensuring safety, we need our ability to monitor, understand, and design ML models to advance at the same pace as the complexity of the models themselves. And there's an obvious way to do that: get AI to do most of the work, such that the sophistication of the AIs that need aligning, and the sophistication of the AIs doing the aligning, advance in lockstep.Jan doesn't want to produce machine learning models capable of doing ML research. But such models are coming, whether we like it or not. And at that point Jan wants to make sure we turn them towards useful alignment and safety work, as much or more than we use them to advance AI capabilities.Jan thinks it's so crazy it just might work. But some critics think it's simply crazy. They ask a wide range of difficult questions, including:If you don't know how to solve alignment, how can you tell that your alignment assistant AIs are actually acting in your interest rather than working against you? Especially as they could just be pretending to care about what you care about.How do you know that these technical problems can be solved at all, even in principle?At the point that models are able to help with alignment, won't they also be so good at improving capabilities that we're in the middle of an explosion in what AI can do?In today's interview host Rob Wiblin puts these doubts to Jan to hear how he responds to each, and they also cover:OpenAI's current plans to achieve 'superalignment' and the reasoning behind themWhy alignment work is the most fundamental and scientifically interesting research in MLThe kinds of people he’s excited to hire to join his team and maybe save the worldWhat most readers misunderstood about the OpenAI announcementThe three ways Jan expects AI to help solve alignment: mechanistic interpretability, generalization, and scalable oversightWhat the standard should be for confirming whether Jan's team has succeededWhether OpenAI should (or will) commit to stop training more powerful general models if they don't think the alignment problem has been solvedWhether Jan thinks OpenAI has deployed models too quickly or too slowlyThe many other actors who also have to do their jobs really well if we're going to have a good AI futurePlenty moreGet this episode by subscribing to our podcast on the world’s most pressing problems and how to solve them: type ‘80,000 Hours’ into your podcasting app. Or read the transcript.Producer and editor: Keiran HarrisAudio Engineering Lead: Ben CordellTechnical editing: Simon Monsour and Milo McGuireAdditional content editing: Katy Moore and Luisa RodriguezTranscriptions: Katy Moore
undefined
23 snips
Jul 27, 2023 • 2h 8min

24 - Superalignment with Jan Leike

Recently, OpenAI made a splash by announcing a new "Superalignment" team. Lead by Jan Leike and Ilya Sutskever, the team would consist of top researchers, attempting to solve alignment for superintelligent AIs in four years by figuring out how to build a trustworthy human-level AI alignment researcher, and then using it to solve the rest of the problem. But what does this plan actually involve? In this episode, I talk to Jan Leike about the plan and the challenges it faces. Patreon: patreon.com/axrpodcast Ko-fi: ko-fi.com/axrpodcast Episode art by Hamish Doodles: hamishdoodles.com/   Topics we discuss, and timestamps:  - 0:00:37 - The superalignment team  - 0:02:10 - What's a human-level automated alignment researcher?    - 0:06:59 - The gap between human-level automated alignment researchers and superintelligence    - 0:18:39 - What does it do?    - 0:24:13 - Recursive self-improvement  - 0:26:14 - How to make the AI AI alignment researcher    - 0:30:09 - Scalable oversight    - 0:44:38 - Searching for bad behaviors and internals    - 0:54:14 - Deliberately training misaligned models  - 1:02:34 - Four year deadline    - 1:07:06 - What if it takes longer?  - 1:11:38 - The superalignment team and...    - 1:11:38 - ... governance    - 1:14:37 - ... other OpenAI teams    - 1:18:17 - ... other labs  - 1:26:10 - Superalignment team logistics  - 1:29:17 - Generalization  - 1:43:44 - Complementary research  - 1:48:29 - Why is Jan optimistic?    - 1:58:32 - Long-term agency in LLMs?    - 2:02:44 - Do LLMs understand alignment?  - 2:06:01 - Following Jan's research   The transcript: axrp.net/episode/2023/07/27/episode-24-superalignment-jan-leike.html   Links for Jan and OpenAI:  - OpenAI jobs: openai.com/careers  - Jan's substack: aligned.substack.com  - Jan's twitter: twitter.com/janleike   Links to research and other writings we discuss:  - Introducing Superalignment: openai.com/blog/introducing-superalignment  - Let's Verify Step by Step (process-based feedback on math): arxiv.org/abs/2305.20050  - Planning for AGI and beyond: openai.com/blog/planning-for-agi-and-beyond  - Self-critiquing models for assisting human evaluators: arxiv.org/abs/2206.05802  - An Interpretability Illusion for BERT: arxiv.org/abs/2104.07143  - Language models can explain neurons in language models https://openaipublic.blob.core.windows.net/neuron-explainer/paper/index.html  - Our approach to alignment research: openai.com/blog/our-approach-to-alignment-research  - Training language models to follow instructions with human feedback (aka the Instruct-GPT paper): arxiv.org/abs/2203.02155
undefined
22 snips
Sep 29, 2021 • 1h 5min

96. Jan Leike - AI alignment at OpenAI

The more powerful our AIs become, the more we’ll have to ensure that they’re doing exactly what we want. If we don’t, we risk building AIs that use dangerously creative solutions that have side-effects that could be undesirable, or downright dangerous. Even a slight misalignment between the motives of a sufficiently advanced AI and human values could be hazardous. That’s why leading AI labs like OpenAI are already investing significant resources into AI alignment research. Understanding that research is important if you want to understand where advanced AI systems might be headed, and what challenges we might encounter as AI capabilities continue to grow — and that’s what this episode of the podcast is all about. My guest today is Jan Leike, head of AI alignment at OpenAI, and an alumnus of DeepMind and the Future of Humanity Institute. As someone who works directly with some of the world’s largest AI systems (including OpenAI’s GPT-3) Jan has a unique and interesting perspective to offer both on the current challenges facing alignment researchers, and the most promising future directions the field might take. ---  Intro music: ➞ Artist: Ron Gelinas ➞ Track Title: Daybreak Chill Blend (original mix) ➞ Link to Track: https://youtu.be/d8Y2sKIgFWc ---  Chapters:   0:00 Intro 1:35 Jan’s background 7:10 Timing of scalable solutions 16:30 Recursive reward modeling 24:30 Amplification of misalignment 31:00 Community focus 32:55 Wireheading 41:30 Arguments against the democratization of AIs 49:30 Differences between capabilities and alignment 51:15 Research to focus on 1:01:45 Formalizing an understanding of personal experience 1:04:04 OpenAI hiring 1:05:02 Wrap-up
undefined
14 snips
Aug 20, 2019 • 33min

AI, Robot

Forget what sci-fi has told you about superintelligent robots that are uncannily human-like; the reality is more prosaic. Inside DeepMind’s robotics laboratory, Hannah explores what researchers call ‘embodied AI’: robot arms that are learning tasks like picking up plastic bricks, which humans find comparatively easy. Discover the cutting-edge challenges of bringing AI and robotics together, and learning from scratch how to perform tasks. She also explores some of the key questions about using AI safely in the real world.If you have a question or feedback on the series, message us on Twitter (@DeepMind using the hashtag #DMpodcast) or email us at podcast@deepmind.com.Further reading:Blogs on AI safety and further resources from Victoria KrakovnaThe Future of Life Institute: The risks and benefits of AIThe Wall Street Journal: Protecting Against AI’s Existential ThreatTED Talks: Max Tegmark - How to get empowered, not overpowered, by AIRoyal Society lecture series sponsored by DeepMind: You & AINick Bostrom: Superintelligence: Paths, Dangers and Strategies (book)OpenAI: Learning from Human PreferencesDeepMind blog: Learning from human preferencesDeepMind blog: Learning by playing - how robots can tidy up after themselvesDeepMind blog: AI safetyInterviewees: Software engineer Jackie Kay and research scientists Murray Shanahan, Victoria Krakovna, Raia Hadsell and Jan Leike.Credits:Presenter: Hannah FryEditor: David PrestSenior Producer: Louisa FieldProducers: Amy Racs, Dan HardoonBinaural Sound: Lucinda Mason-BrownMusic composition: Eleni Shaw (with help from Sander Dieleman and WaveNet)Commissioned by DeepMind Please like and subscribe on your preferred podcast platform. Want to share feedback? Or have a suggestion for a guest that we should have on next? Leave us a comment on YouTube and stay tuned for future episodes.  
undefined
May 18, 2024 • 15min

Political Battles at OpenAI, Safety vs. Capability in AI, Superalignment’s Death

Former OpenAI Chief Scientist Ilya Sutskever and Head of Alignment Jan Leike discuss the internal strife at OpenAI, focusing on the debate between safety and capability in AI development. They explore philosophical debates on AI capabilities, the establishment of a super alignment team, and the conflict within the organization leading to key departures.