
The Plan - 2023 Version
LessWrong (Curated & Popular)
00:00
Challenges of Reward-Based Approaches in AI Training
This chapter explores the potential challenges of using reward-based approaches to train AI systems, including the errors made by human raters and the harmful consequences of perfectly learning and maximizing rewards assigned by humans. It discusses the limitations of alignment by iteration and the failure of iterative design loops, emphasizing the need to address these issues for AGI development. The chapter also examines the challenges of outsourcing hard alignment problems to AI and highlights the importance of the client's expertise in improving their understanding of alignment.
Play episode from 31:23
Transcript


