AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
The Importance of Scalable Oversight
Are we confident to figure out how to make it work? Are we actually measuring the right thing? I think that's also a lot of things. It's the only way that this plan could not work out that keeps you awake at night, that we haven't already mentioned. We're really trying to hire a lot right now. The real way for us to scale is using EI.
In July, OpenAI announced a new team and project: Superalignment. The goal is to figure out how to make superintelligent AI systems aligned and safe to use within four years, and the lab is putting a massive 20% of its computational resources behind the effort.
Today's guest, Jan Leike, is Head of Alignment at OpenAI and will be co-leading the project. As OpenAI puts it, "...the vast power of superintelligence could be very dangerous, and lead to the disempowerment of humanity or even human extinction. ... Currently, we don't have a solution for steering or controlling a potentially superintelligent AI, and preventing it from going rogue."
Links to learn more, summary and full transcript.
Given that OpenAI is in the business of developing superintelligent AI, it sees that as a scary problem that urgently has to be fixed. So it’s not just throwing compute at the problem -- it’s also hiring dozens of scientists and engineers to build out the Superalignment team.
Plenty of people are pessimistic that this can be done at all, let alone in four years. But Jan is guardedly optimistic. As he explains:
Honestly, it really feels like we have a real angle of attack on the problem that we can actually iterate on... and I think it's pretty likely going to work, actually. And that's really, really wild, and it's really exciting. It's like we have this hard problem that we've been talking about for years and years and years, and now we have a real shot at actually solving it. And that'd be so good if we did.
Jan thinks that this work is actually the most scientifically interesting part of machine learning. Rather than just throwing more chips and more data at a training run, this work requires actually understanding how these models work and how they think. The answers are likely to be breakthroughs on the level of solving the mysteries of the human brain.
The plan, in a nutshell, is to get AI to help us solve alignment. That might sound a bit crazy -- as one person described it, “like using one fire to put out another fire.”
But Jan’s thinking is this: the core problem is that AI capabilities will keep getting better and the challenge of monitoring cutting-edge models will keep getting harder, while human intelligence stays more or less the same. To have any hope of ensuring safety, we need our ability to monitor, understand, and design ML models to advance at the same pace as the complexity of the models themselves.
And there's an obvious way to do that: get AI to do most of the work, such that the sophistication of the AIs that need aligning, and the sophistication of the AIs doing the aligning, advance in lockstep.
Jan doesn't want to produce machine learning models capable of doing ML research. But such models are coming, whether we like it or not. And at that point Jan wants to make sure we turn them towards useful alignment and safety work, as much or more than we use them to advance AI capabilities.
Jan thinks it's so crazy it just might work. But some critics think it's simply crazy. They ask a wide range of difficult questions, including:
In today's interview host Rob Wiblin puts these doubts to Jan to hear how he responds to each, and they also cover:
Get this episode by subscribing to our podcast on the world’s most pressing problems and how to solve them: type ‘80,000 Hours’ into your podcasting app. Or read the transcript.
Producer and editor: Keiran Harris
Audio Engineering Lead: Ben Cordell
Technical editing: Simon Monsour and Milo McGuire
Additional content editing: Katy Moore and Luisa Rodriguez
Transcriptions: Katy Moore
Listen to all your favourite podcasts with AI-powered features
Listen to the best highlights from the podcasts you love and dive into the full episode
Hear something you like? Tap your headphones to save it with AI-generated key takeaways
Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more
Listen to all your favourite podcasts with AI-powered features
Listen to the best highlights from the podcasts you love and dive into the full episode