AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
The goal of the super alignment project is to automate alignment research for AI systems. While aligning super intelligent AI systems may be a difficult problem, the project aims to align the next generation of AI systems that are closer to human-level capabilities. By making progress on aligning these more achievable systems, they can be used to solve the alignment problem for even more advanced systems in the future.
The project focuses on two key areas: scalable oversight and generalization. Scalable oversight involves developing methods to train AI systems to find and evaluate bugs in code or other tasks. These methods, such as discriminators and critique models, aim to identify flaws that even humans might miss. Generalization research aims to understand how AI systems can go beyond supervised data and generalize their behavior, ensuring they align with human intent even in complex situations. The project explores ways to improve generalization, such as training on easy tasks and assessing performance on harder ones.
The project acknowledges that the future of AI research will involve models capable of performing research tasks beyond human capabilities. The key challenge is to ensure the trustworthiness of these AI systems, such that they can assist in research without introducing biases or deceptive behavior. By developing techniques to evaluate and align these systems effectively, the project aims to build trust and confidence in AI research outputs, enabling collaboration between humans and AI systems in advancing scientific progress.
One of the main ideas discussed in this podcast episode is the significance of scalable oversight and generalization in the field of alignment research. The speaker emphasizes the need to develop methods that can be applied in real settings and generalize from smaller language models to larger, more complex systems like GPT. This is important for addressing alignment problems and understanding how neural networks learn. By running experiments and gathering data on generalization, researchers can gain valuable insights into the behavior and performance of AI models.
Another key point discussed in the podcast is the challenge of interpretability in AI systems. The speaker acknowledges the importance of understanding the internal workings and goals of AI models. However, they raise concerns about the difficulty of interpreting neural networks and the limitations of current interpretability techniques. They also highlight the ethical considerations in aligning AI with human values and the need for evaluating alignment methods. The goal is to develop reliable methods for assessing and ensuring alignment, but also to foster transparency and expert collaboration in the field.
The podcast delves into the potential risks associated with AI development and deployment. The speaker recognizes the need to carefully evaluate alignment progression and avoid premature deployment of misaligned AI systems. They emphasize the importance of continuous monitoring, validation, and collaboration across the AI community to ensure safety. The approach taken by OpenAI's superalignment team focuses on iterative progress, aligning current systems like GPT-5, and collaborating with other researchers to share insights and expertise. By prioritizing safety and accountability, they aim to align AI models and minimize potential risks.
The podcast episode discusses the importance of finding the right metrics and making the right research bets in AI development. It emphasizes the need for a large number of people in the loop to check the work of AI models and ensure alignment and prevention of deception. Scaling AI development through the use of virtual workers is also highlighted, emphasizing the reliance on compute and the potential ratio of AI staff to human staff members.
The podcast delves into the challenges of governance, generalization, and ethics in AI development. It explores the need to solve the problem of supervision and trust in a larger amount of virtual workers, while also addressing the potential differential empowerment and misuse of AI. The podcast emphasizes the importance of aligning AI with democratic values and ensuring AI's positive impact on marginalized groups. It also raises concerns about avoiding structural risks and ensuring that AI benefits humanity as a whole.
OpenAI is hiring research engineers, research scientists, and research managers for their Super Alignment Team. They are seeking individuals with a good understanding of current ML technology, ability to run experiments, and strong critical thinking skills. While a machine learning PhD is not required, a good understanding of ML and experience in alignment research is preferred. The team is looking for highly motivated individuals who are passionate about solving the alignment problem and contributing to the future of AI.
OpenAI is committed to solving the alignment problem and ensuring the safety of AI systems. They aim to convince the scientific community by providing a mountain of empirical evidence and inviting external experts to scrutinize their work. OpenAI is focused on continuously improving safety and alignment measures as the stakes get higher. They believe that openness and transparency are crucial, and they welcome feedback from the public and experts. OpenAI is also looking for individuals who care about the alignment problem and want to contribute to building a safe and beneficial AI future.
In July, OpenAI announced a new team and project: Superalignment. The goal is to figure out how to make superintelligent AI systems aligned and safe to use within four years, and the lab is putting a massive 20% of its computational resources behind the effort.
Today's guest, Jan Leike, is Head of Alignment at OpenAI and will be co-leading the project. As OpenAI puts it, "...the vast power of superintelligence could be very dangerous, and lead to the disempowerment of humanity or even human extinction. ... Currently, we don't have a solution for steering or controlling a potentially superintelligent AI, and preventing it from going rogue."
Links to learn more, summary and full transcript.
Given that OpenAI is in the business of developing superintelligent AI, it sees that as a scary problem that urgently has to be fixed. So it’s not just throwing compute at the problem -- it’s also hiring dozens of scientists and engineers to build out the Superalignment team.
Plenty of people are pessimistic that this can be done at all, let alone in four years. But Jan is guardedly optimistic. As he explains:
Honestly, it really feels like we have a real angle of attack on the problem that we can actually iterate on... and I think it's pretty likely going to work, actually. And that's really, really wild, and it's really exciting. It's like we have this hard problem that we've been talking about for years and years and years, and now we have a real shot at actually solving it. And that'd be so good if we did.
Jan thinks that this work is actually the most scientifically interesting part of machine learning. Rather than just throwing more chips and more data at a training run, this work requires actually understanding how these models work and how they think. The answers are likely to be breakthroughs on the level of solving the mysteries of the human brain.
The plan, in a nutshell, is to get AI to help us solve alignment. That might sound a bit crazy -- as one person described it, “like using one fire to put out another fire.”
But Jan’s thinking is this: the core problem is that AI capabilities will keep getting better and the challenge of monitoring cutting-edge models will keep getting harder, while human intelligence stays more or less the same. To have any hope of ensuring safety, we need our ability to monitor, understand, and design ML models to advance at the same pace as the complexity of the models themselves.
And there's an obvious way to do that: get AI to do most of the work, such that the sophistication of the AIs that need aligning, and the sophistication of the AIs doing the aligning, advance in lockstep.
Jan doesn't want to produce machine learning models capable of doing ML research. But such models are coming, whether we like it or not. And at that point Jan wants to make sure we turn them towards useful alignment and safety work, as much or more than we use them to advance AI capabilities.
Jan thinks it's so crazy it just might work. But some critics think it's simply crazy. They ask a wide range of difficult questions, including:
In today's interview host Rob Wiblin puts these doubts to Jan to hear how he responds to each, and they also cover:
Get this episode by subscribing to our podcast on the world’s most pressing problems and how to solve them: type ‘80,000 Hours’ into your podcasting app. Or read the transcript.
Producer and editor: Keiran Harris
Audio Engineering Lead: Ben Cordell
Technical editing: Simon Monsour and Milo McGuire
Additional content editing: Katy Moore and Luisa Rodriguez
Transcriptions: Katy Moore
Listen to all your favourite podcasts with AI-powered features
Listen to the best highlights from the podcasts you love and dive into the full episode
Hear something you like? Tap your headphones to save it with AI-generated key takeaways
Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more
Listen to all your favourite podcasts with AI-powered features
Listen to the best highlights from the podcasts you love and dive into the full episode