Challenges of Reward-Based Approaches in AI Training

This chapter explores the potential challenges of using reward-based approaches to train AI systems, including the errors made by human raters and the harmful consequences of perfectly learning and maximizing rewards assigned by humans. It discusses the limitations of alignment by iteration and the failure of iterative design loops, emphasizing the need to address these issues for AGI development. The chapter also examines the challenges of outsourcing hard alignment problems to AI and highlights the importance of the client's expertise in improving their understanding of alignment.

Play episode from 31:23

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app