AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
How to Be Agnostic to the Speed of Tech Progress
Open AI banking on there not being a really fast take off and do they try to make plans that could also work in the event of a of a film scenario. automated alignment research is probably like the best plan I know in that kind of scenario where you really have to scale up your alignment work in proportion with what's going on. The more you can prepare the more you'll be ready for that scenario. It doesn't feel particularly likely to me but I think we're going to work really hard to find a counter example by showing that it can be done.
In July, OpenAI announced a new team and project: Superalignment. The goal is to figure out how to make superintelligent AI systems aligned and safe to use within four years, and the lab is putting a massive 20% of its computational resources behind the effort.
Today's guest, Jan Leike, is Head of Alignment at OpenAI and will be co-leading the project. As OpenAI puts it, "...the vast power of superintelligence could be very dangerous, and lead to the disempowerment of humanity or even human extinction. ... Currently, we don't have a solution for steering or controlling a potentially superintelligent AI, and preventing it from going rogue."
Links to learn more, summary and full transcript.
Given that OpenAI is in the business of developing superintelligent AI, it sees that as a scary problem that urgently has to be fixed. So it’s not just throwing compute at the problem -- it’s also hiring dozens of scientists and engineers to build out the Superalignment team.
Plenty of people are pessimistic that this can be done at all, let alone in four years. But Jan is guardedly optimistic. As he explains:
Honestly, it really feels like we have a real angle of attack on the problem that we can actually iterate on... and I think it's pretty likely going to work, actually. And that's really, really wild, and it's really exciting. It's like we have this hard problem that we've been talking about for years and years and years, and now we have a real shot at actually solving it. And that'd be so good if we did.
Jan thinks that this work is actually the most scientifically interesting part of machine learning. Rather than just throwing more chips and more data at a training run, this work requires actually understanding how these models work and how they think. The answers are likely to be breakthroughs on the level of solving the mysteries of the human brain.
The plan, in a nutshell, is to get AI to help us solve alignment. That might sound a bit crazy -- as one person described it, “like using one fire to put out another fire.”
But Jan’s thinking is this: the core problem is that AI capabilities will keep getting better and the challenge of monitoring cutting-edge models will keep getting harder, while human intelligence stays more or less the same. To have any hope of ensuring safety, we need our ability to monitor, understand, and design ML models to advance at the same pace as the complexity of the models themselves.
And there's an obvious way to do that: get AI to do most of the work, such that the sophistication of the AIs that need aligning, and the sophistication of the AIs doing the aligning, advance in lockstep.
Jan doesn't want to produce machine learning models capable of doing ML research. But such models are coming, whether we like it or not. And at that point Jan wants to make sure we turn them towards useful alignment and safety work, as much or more than we use them to advance AI capabilities.
Jan thinks it's so crazy it just might work. But some critics think it's simply crazy. They ask a wide range of difficult questions, including:
In today's interview host Rob Wiblin puts these doubts to Jan to hear how he responds to each, and they also cover:
Get this episode by subscribing to our podcast on the world’s most pressing problems and how to solve them: type ‘80,000 Hours’ into your podcasting app. Or read the transcript.
Producer and editor: Keiran Harris
Audio Engineering Lead: Ben Cordell
Technical editing: Simon Monsour and Milo McGuire
Additional content editing: Katy Moore and Luisa Rodriguez
Transcriptions: Katy Moore
Listen to all your favourite podcasts with AI-powered features
Listen to the best highlights from the podcasts you love and dive into the full episode
Hear something you like? Tap your headphones to save it with AI-generated key takeaways
Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more
Listen to all your favourite podcasts with AI-powered features
Listen to the best highlights from the podcasts you love and dive into the full episode